Browse wiki

Jump to: navigation, search
Aisles through the category forest;Utilising the Wikipedia Category System for Corpus Building in Machine Learning
Abstract The Word Wide Web is a continuous challengThe Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches to web mining, respective corpora are needed. However, the composition of genre- or domain-specific web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.ain specific corpora for machine learning.
Abstractsub The Word Wide Web is a continuous challengThe Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches to web mining, respective corpora are needed. However, the composition of genre- or domain-specific web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.ain specific corpora for machine learning.
Bibtextype inproceedings  +
Has author Rudiger Gleim + , Alexander Mehler + , Matthias Dehmer + , Olga Pustylnikov +
Has extra keyword Category system + , Category systems + , Corpus construction + , Domain specific + , Machine learning + , MediaWiki + , Meta information + , Semantic classification + , Social ontology + , Social tagging + , Unsolved problems + , Web Corpora + , Web mining + , Web page + , Wikipedia + , Written texts + , Information systems + , Mining + , Ontology + , Robot learning + , Websites + , Software agents +
Has keyword Category system + , Corpus construction + , Social tagging + , Wikipedia +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 142–149  +
Published in Webist 2007 - 3rd International Conference on Web Information Systems and Technologies, Proceedings +
Title Aisles through the category forest;Utilising the Wikipedia Category System for Corpus Building in Machine Learning +
Type conference paper  +
Volume WIA  +
Year 2007 +
Creation dateThis property is a special property in this wiki. 6 November 2014 16:10:05  +
Categories Publications without license parameter  + , Publications without DOI parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 6 November 2014 16:10:05  +
DateThis property is a special property in this wiki. 2007  +
hide properties that link here 
Aisles through the category forest;Utilising the Wikipedia Category System for Corpus Building in Machine Learning + Title
 

 

Enter the name of the page to start browsing from.