| Zalan Bodo|
(Alternative names for this author)
|Co-authors||Lehel Csato, Zsolt Minier|
|Authorship||Publications (2), datasets (0), tools (0)|
|Citations||Total (0), average (0), median (0), max (0), min (0)|
|DBLP · Google Scholar|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of authors|
Zalan Bodo is an author.
PublicationsOnly those publications related to wikis are shown here.
|Title||Keyword(s)||Published in||Language||DateThis property is a special property in this wiki.||Abstract||R||C|
|Wikipedia-Based Kernels for Text Categorization||English||2007||0||0|
|Wikipedia-based Kernels for text categorization||Proceedings - 9th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2007||English||2007||In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence between indexing terms, usually words, is assumed. Since training corpora sizes are relatively small - compared to ≈ ∞ what would be required for a realistic number of words - the generalization power of the learning algorithms is low. It is assumed that a bigger text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch , we incorporate Wikipedia articles into the system to give word distributional representation for documents. The extension with this new corpus causes dimensionality increase, therefore clustering of features is needed. We use Latent Semantic Analysis (LSA), Kernel Principal Component Analysis (KPCA) and Kernel Canonical Correlation Analysis (KCCA) and present results for these experiments on the Reuters corpus.||0||0|