Automatic link detection: A sequence labeling approach
|Automatic link detection: A sequence labeling approach|
|Author(s)||Gardner J.J., Xiong L.|
|Published in||International Conference on Information and Knowledge Management, Proceedings|
|Keyword(s)||Data mining, Semantic web, Sequence labeling, Wikipedia (Extra: Automatic linking, Automatic links, Conditional random field, Data sets, Hyperlinking, Hyperlinks, Knowledge basis, Machine learning communities, Probabilistic framework, Sequence Labeling, Sequential data, Sub-problems, Wikipedia, Hypertext systems, Knowledge management, Semantic Web, Semantics, Labeling)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Automatic link detection: A sequence labeling approach is a 2009 conference paper written in English by Gardner J.J., Xiong L. and published in International Conference on Information and Knowledge Management, Proceedings.
The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hyperlinking can be viewed as two sub problems - link detection which determines the source of a link, and link disambiguation which determines the destination of a link. Wikipedia is a rich corpus with hyperlink data provided by authors. It is possible to use this data to train classifiers to be able to mimic the authors in some capacity. In this paper, we introduce automatic link detection as a sequence labeling problem. Conditional random fields (CRFs) are a probabilistic framework for labeling sequential data. We show that training a CRF with different types of features from the Wikipedia dataset can be used to automatically detect links with almost perfect precision and high recall. Copyright 2009 ACM.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 4 time(s)