Carlotta Domeniconi

From WikiPapers
Jump to: navigation, search

Carlotta Domeniconi is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Embedding Semantics in LDA Topic Models Data-driven prior-knowledge
External prior-knowledge
Latent dirichlet allocation
Online lda
Semantic embedding
Topic models
Wikipedia-influenced topic model
Text Mining: Applications and Theory English 2010 [No abstract available] 0 0
Towards a universal text classifier: Transfer learning using encyclopedic knowledge Text classifiers
Transfer learning
Wikipedia
ICDM Workshops 2009 - IEEE International Conference on Data Mining English 2009 Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available. In this work, we propose a universal text classifier, which does not require any labeled document. Our approach simulates the capability of people to classify documents based on background knowledge. As such, we build a classifier that can effectively group documents based on their content, under the guidance of few words describing the classes of interest. Background knowledge is modeled using encyclopedic knowledge, namely Wikipedia. The universal text classifier can also be used to perform document retrieval. In our experiments with real data we test the feasibility of our approach for both the classification and retrieval tasks. 0 0
Building semantic kernels for text classification using Wikipedia English 2008 Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW technique, and to other recently developed methods. 0 1
Using Wikipedia for Co-clustering Based Cross-Domain Text Classification Cross-domain text classification
Co-clustering
Transfer learning
Wikipedia
ICDM English 2008 0 0