Marco Cristo

From WikiPapers
Revision as of 01:34, November 5, 2014 by AnankeBot (Talk | contribs) (Autocreate page for wanted author with incoming links)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Marco Cristo is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
An investigation of the relationship between the amount of extra-textual data and the quality of Wikipedia articles Content quality
Correlations
Extra-textual data
Wikipedia
WebMedia 2013 - Proceedings of the 19th Brazilian Symposium on Multimedia and the Web English 2013 Wikipedia, a web-based collaboratively maintained free encyclopedia, is emerging as one of the most important websites on the internet. However, its openness raises many concerns about the quality of the articles and how to assess it automatically. In the Portuguese-speaking Wikipedia, articles can be rated by bots and by the community. In this paper, we investigate the correlation between these ratings and the count of media items (namely images and sounds) through a series of experiments. Our results show that article ratings and the count of media items are correlated. 0 0
How do metrics of link analysis correlate to quality, relevance and popularity in Wikipedia? Information retrieval
Link analysis
Quality of content
Wikipedia
WebMedia 2013 - Proceedings of the 19th Brazilian Symposium on Multimedia and the Web English 2013 Many links between Web pages can be viewed as indicative of the quality and importance of the pages they pointed to. Accordingly, several studies have proposed metrics based on links to infer web page content quality. However, as far as we know, the only work that has examined the correlation between such metrics and content quality consisted of a limited study that left many open questions. In spite of these metrics having been shown successful in the task of ranking pages which were provided as answers to queries submitted to search engines, it is not possible to determine the specific contribution of factors such as quality, popularity, and importance to the results. This difficulty is partially due to the fact that such information is hard to obtain for Web pages in general. Unlike ordinary Web pages, the quality, importance and popularity of Wikipedia articles are evaluated by human experts or might be easily estimated. Thus, it is feasible to verify the relation between link analysis metrics and such factors in Wikipedia articles, our goal in this work. To accomplish that, we implemented several link analysis algorithms and compared their resulting rankings with the ones created by human evaluators regarding factors such as quality, popularity and importance. We found that the metrics are more correlated to quality and popularity than to importance, and the correlation is moderate. 0 0
Selecting keywords to represent web pages using wikipedia information Context-based advertising
Keyword extraction
Wikipedia
WebMedia'12 - Proceedings of the 2012 Brazilian Symposium on Multimedia and the Web English 2012 In this paper we present three new methods to extract keywords from web pages using Wikipedia as an external source of information. The information used from Wikipedia includes the titles of articles, co-occurrence of keywords and categories associated with each Wikipedia definition. We compare our methods with three keyword extraction methods used as baselines: (i) all the terms of a web page, (ii) a TF-IDF implementation that extracts single weighted words of a web page and (iii) a previously proposed Wikipediabased keyword extraction method presented in the literature. We compare our three keyword extraction methods with the baseline methods in three distinct scenarios, all related to our target application, which is the selection of ads in a context-based advertising system. In the first scenario, the target pages to place ads were extracted from Wikipedia articles, whereas the target pages in the other two scenarios were extracted from a news web site. Experimental results show that our methods are quite competitive solutions for the task of selecting good keywords to represent target web pages, albeit being simple, effective and time efficient. For instance, in the first scenario our best method used to extract keywords from Wikipedia articles achieved an improvement of 33% when compared to the second best baseline, and a gain of 26% when considering all the terms. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM). 0 0
Automatic assessment of document quality in web collaborative digital libraries Machine learning
Quality assessment
Quality features
SVM
Wiki
Journal of Data and Information Quality English 2011 The old dream of a universal repository containing all of human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and open edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its quality. In this work, we explore a significant number of quality indicators and study their capability to assess the quality of articles from three Web collaborative digital libraries. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment. Through experiments, we show that the most important quality indicators are those which are also the easiest to extract, namely, the textual features related to the structure of the article. Moreover, to the best of our knowledge, this work is the first that shows an empirical comparison between Web collaborative digital libraries regarding the task of assessing article quality. 0 0
Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia Machine learning
Quality assessment
SVM
Wikipedia
English 2009 The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction. 0 3