A novel weighting scheme for efficient document indexing and classification

From WikiPapers
Jump to: navigation, search

A novel weighting scheme for efficient document indexing and classification is a 2010 conference paper written in English by Tahayna B., Ayyasamy R.K., Alhashmi S., Eu-Gene S. and published in Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10.

[edit] Abstract

In this paper we propose and illustrate the effectiveness of a new topic-based document classification method. The proposed method utilizes the Wikipedia, a large scale Web encyclopaedia that has high-quality and huge-scale articles and a category system. Wikipedia is used using an Ngram technique to transform the document from being a "bag of words" to become a "bag of concepts". Based on this transformation, a novel concept-based weighting scheme (denoted as Conf.idf) is proposed to index the text with the flavor of the traditional tf.idf indexing scheme. Moreover, a genetic algorithm-based support vector machine optimization method is used for the purpose of feature subset and instance selection. Experimental results showed that proposed weighting scheme outperform the traditional indexing and weighting scheme.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 6 time(s)