Patrick Gallinari

From WikiPapers
Jump to: navigation, search

Patrick Gallinari is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Overview of the INEX 2009 XML mining track: Clustering and classification of XML documents Classification
Clustering
INEX
Structure and content
Wikipedia
XML document mining
Lecture Notes in Computer Science English 2010 This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants. 0 0
Overview of the INEX 2008 XML Mining Track Advances in Focused Retrieval 2009 We describe here the {XML} Mining Track at {INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning {(ML)} tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of {XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the {XML} documents and also the link information between documents. 0 0
Overview of the INEX 2008 XML mining track categorization and clustering of XML documents in a graph of documents Lecture Notes in Computer Science English 2009 We describe here the XML Mining Track at INEX 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML documents and also the link information between documents. 0 0
Machine learning for semi-structured multimedia documents: Application to pornographic filtering and thematic categorization Cognitive Technologies English 2008 We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents. 0 0
The Wikipedia XML corpus English 2006 Wikipedia is a well known free content, multilingual encyclopedia written collaboratively by contributors around the world. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. This encyclopedia is composed of millions of articles in different languages. 0 1