Browse wiki

Comparative analysis of text representation methods using classification
Abstract In our work, we review and empirically evaIn our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article - evaluation of approaches to text representation for machine learning tasks - indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot be compensated for even by sophisticated machine learning algorithms. It confirms the thesis that proper data representation is a prerequisite for achieving high-quality results of data analysis. Evaluation of the text representations was performed within the Wikipedia repository by examination of classification parameters observed during automatic reconstruction of human-made categories. For that purpose, we use a classifier based on a support vector machines method, extended with multilabel and multiclass functionalities. During classifier construction we observed parameters such as learning time, representation size, and classification quality that allow us to draw conclusions about text representations. For the experiments presented in the article, we use data sets created from Wikipedia dumps. We describe our software, called Matrixu, which allows a user to build computational representations of Wikipedia articles. The software is the second contribution of our research, because it is a universal tool for converting Wikipedia from a human-readable form to a form that can be processed by a machine. Results generated using Matrixu can be used in a wide range of applications that involve usage of Wikipedia data.ions that involve usage of Wikipedia data.
Abstractsub In our work, we review and empirically evaIn our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article - evaluation of approaches to text representation for machine learning tasks - indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot be compensated for even by sophisticated machine learning algorithms. It confirms the thesis that proper data representation is a prerequisite for achieving high-quality results of data analysis. Evaluation of the text representations was performed within the Wikipedia repository by examination of classification parameters observed during automatic reconstruction of human-made categories. For that purpose, we use a classifier based on a support vector machines method, extended with multilabel and multiclass functionalities. During classifier construction we observed parameters such as learning time, representation size, and classification quality that allow us to draw conclusions about text representations. For the experiments presented in the article, we use data sets created from Wikipedia dumps. We describe our software, called Matrixu, which allows a user to build computational representations of Wikipedia articles. The software is the second contribution of our research, because it is a universal tool for converting Wikipedia from a human-readable form to a form that can be processed by a machine. Results generated using Matrixu can be used in a wide range of applications that involve usage of Wikipedia data.ions that involve usage of Wikipedia data.
Bibtextype article  +
Doi 10.1080/01969722.2014.874828  +
Has author Szymanski J. +
Has extra keyword Automatic reconstruction + , Classification parameters + , Classification quality + , Documents categorization + , Sophisticated machines + , Text classification + , Text representation + , Wikipedia + , Classification (of information) + , Information retrieval + , Learning algorithms + , Learning systems + , Text processing +
Has keyword Documents categorization + , Information retrieval + , Text classification + , Text representation + , Wikipedia +
Issn 1969722  +
Issue 2  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 180–199  +
Published in Cybernetics and Systems +
Title Comparative analysis of text representation methods using classification +
Type literature review  +
Volume 45  +
Year 2014 +
Creation dateThis property is a special property in this wiki. 7 November 2014 02:43:46  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Literature reviews  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 02:43:46  +
DateThis property is a special property in this wiki. 2014  +
hide properties that link here 
Comparative analysis of text representation methods using classification + Title
 

 

Enter the name of the page to start browsing from.