On empirical tradeoffs in large scale hierarchical classification
|On empirical tradeoffs in large scale hierarchical classification|
|Author(s)||Babbar R., Partalas I., Gaussier E., Amblard C.|
|Published in||ACM International Conference Proceeding Series|
|Keyword(s)||empirical tradeoffs, hierarchical classification (Extra: Automatic classification, Classification system, empirical tradeoffs, Error bound, Heterogenity, Hierarchical classification, Large scale hierarchies, Learning Theory, Mozilla, Multi-class categorization, Test instances, Test time, Text classification, Text document, Topdown, Training time, Wikipedia, Commerce, Error analysis, Knowledge management, Classification (of information))|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
On empirical tradeoffs in large scale hierarchical classification is a 2012 conference paper written in English by Babbar R., Partalas I., Gaussier E., Amblard C. and published in ACM International Conference Proceeding Series.
While multi-class categorization of documents has been of research interest for over a decade, relatively fewer approaches have been proposed for large scale taxonomies in which the number of classes range from hundreds of thousand as in Directory Mozilla to over a million in Wikipedia. As a result of ever increasing number of text documents and images from various sources, there is an immense need for automatic classification of documents in such large hierarchies. In this paper, we analyze the tradeoffs between the important characteristics of different classifiers employed in the top down fashion. The properties for relative comparison of these classifiers include, (i) accuracy on test instance, (ii) training time (iii) size of the model and (iv) test time required for prediction. Our analysis is motivated by the well known error bounds from learning theory, which is also further reinforced by the empirical observations on the publicly available data from the Large Scale Hierarchical Text Classification Challenge. We show that by exploiting the data heterogenity across the large scale hierarchies, one can build an overall classification system which is approximately 4 times faster for prediction, 3 times faster to train, while sacrificing only 1% point in accuracy.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.