Study of ontology or thesaurus based document clustering and information retrieval

From WikiPapers
Jump to: navigation, search

Study of ontology or thesaurus based document clustering and information retrieval is a 2012 journal article written in English by Bharathi G., Venkatesan D. and published in Journal of Theoretical and Applied Information Technology.

[edit] Abstract

Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniques to be scalable to large and high dimensional data, and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem, such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques which depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently which can be easily applied to non-ontological clustering. To overcome the above issues, this paper makes a survey of recent research done on ontology or thesaurus based document clustering.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 1 time(s)


Study of Ontology or thesaurus based document clustering and information retrieval is a 2012 journal article written in English by Bharathi G., Venkatesan D. and published in Journal of Engineering and Applied Sciences.

[edit] Abstract

Document clustering generate clusters from the whole document collection automatically and is used in many fields including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most impodant ones. These characteristics of text data require clustering techmques to be scalable to large and hgh dimensional data and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques whch depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently whch can be easily applied to non-ontological clustering. To overcome these issues, this study makes a survey of recent research done on ontology or thesaurus based document clustering.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.