Implementation of the "Text Classification using String Kernels" publication by Lodhi et al. Code was written mainly in Python with some parts moved to Cython for performance gains. The final report can be found here.
This project was carried out as part of the DD2434 "Advanced Machine Learning" course at KTH Royal Institute of Technology.
- F. Franzen (github: flammi)
- B. Godefroy (github: BGodefroyFR)
- W. Kryściński (github: muggin)
- V. Polianskii (github: vlpolyansky)
Files in the data directory:
train_dataandtest_data- original Reuters dataset split (Modified Apte) and (Pickled)train_data_cleanandtest_data_clean- preprocessed and cleaned dataset (Pickled)train_data_smallandtest_data_small- trimmed dataset prepared for experiments (Pickled)precomp_kernels- directory with precomputed SSK gram matricesapprox- directory with precomputed approximated-SSK files
Before using SSK kernel compile Cython code using:
python setup.py build_ext --inplace