A new approach for building domain-specific corpus with wikipedia
|A new approach for building domain-specific corpus with wikipedia|
|Author(s)||Zhang X.Y., Li X., Ruan Z.J.|
|Published in||Applied Mechanics and Materials|
|Keyword(s)||Domain-specific corpus, Kosaraju algorithm based, Multi-root method, Wikipedia (Extra: Domain ontologies, Domain specific, Experimental evaluation, Kosaraju algorithms, Multi-root method, New approaches, Topological sort, Wikipedia, Directed graphs, Information science, Algorithms)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Domain-specific corpus can be used to build domain ontology, which is used in many areas such as IR, NLP and web Mining. We propose a multi-root based method to build a domain-specific corpus making use of Wikipedia resources. First we select some top-level nodes (Wikipedia category articles) as root nodes and traverse the Wikipedia using BFS-like algorithm. After the traverse, we get a directed Wikipedia graph (Wiki-graph). Then an algorithm mainly based on Kosaraju Algorithm is proposed to remove the cycles in the Wiki-graph. Finally, topological sort algorithm is used to traverse the Wiki-graph, and ranking and filtering is done during the process. When computing a node's ranking score, the in-degree of itself and the out-degree of its parents are both considered. The experimental evaluation shows that our method could get a high-quality domain-specific corpus.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.