Automatic Ontology Extraction for Document Classification

From WikiPapers
Jump to: navigation, search

Automatic Ontology Extraction for Document Classification is a 2006 master's thesis by Natalia Kozlova and published in Saarland University.

[edit] Abstract

The amount of information in the world is enormous. Millions of documents in electronic libraries, thousands of them on each personal computer waiting for the expert to organize this information, to be assigned to appropriate categories. Automatic classification can help. However, synonymy, polysemy and word usage patterns problems usually arise. Modern knowledge representation mechanisms such as ontologies can be used as a solution to these issues. Ontology-driven classification is a powerful technique which combines the advantages of modern classification methods with semantic specificity of the ontologies. One of the key issues here is the cost and difficulty of the ontology building process, especially if we do not want to stick to any specific field. Creating a generally applicable but simple ontology is a challenging task. Even manually compiled thesauri such as WordNet can be overcrowded and noisy. We propose a flexible framework for efficient ontology extraction in document classification purposes. In this work we developed a set of ontology extraction rules. Our framework was tested on the manually created corpus of Wikipedia, the free encyclopedia. We present a software tool, developed with regard to the claimed principles. Its architecture is open for embedding new features in. The ontology-driven document classification experiments were performed on the Reuters collection. We study the behavior of different classifiers on different ontologies, varying our experimental setup. Experiments show that the performance of our system is better, in comparison to other approaches. In this work we observe and state the potential of automatic ontology extraction techniques and highlight directions for the further investigation.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.