Browse wiki

Jump to: navigation, search
Web page rank prediction with PCA and em clustering
Abstract In this paper we describe learning algoritIn this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.ent resource management by search engines.
Abstractsub In this paper we describe learning algoritIn this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.ent resource management by search engines.
Bibtextype inproceedings  +
Doi 10.1007/978-3-540-95995-3_9  +
Has author Zacharouli P. + , Titsias M. + , Vazirgiannis M. +
Has extra keyword Cluster analysis + , Clustering algorithms + , Learning systems + , Mathematical models + , Principal component analysis + , Regression analysis + , Resource allocation + , Search engine + , Time series + , World Wide Web + , E-M algorithms + , Em clustering + , Internet archives + , Linear regression models + , Page ranks + , Principal components analysis + , Probabilistic clustering + , Real data sets + , Regression models + , Resource managements + , Time-series datum + , Web pages + , Wikipedia + , Learning algorithms +
Issn 3029743  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 104–115  +
Published in Lecture Notes in Computer Science +
Title Web page rank prediction with PCA and em clustering +
Type conference paper  +
Volume 5427 LNCS  +
Year 2009 +
Creation dateThis property is a special property in this wiki. 8 November 2014 07:47:19  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 8 November 2014 07:47:19  +
DateThis property is a special property in this wiki. 2009  +
hide properties that link here 
Web page rank prediction with PCA and em clustering + Title
 

 

Enter the name of the page to start browsing from.