A comparison of dimensionality reduction techniques for Web structure mining

From WikiPapers
Jump to: navigation, search

A comparison of dimensionality reduction techniques for Web structure mining is a 2007 conference paper written in English by Chikhi N.F., Rothenburger B., Aussenac-Gilles N. and published in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007.

[edit] Abstract

In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the wellknown WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 6 time(s)