Weinan Zhang

From WikiPapers
Jump to: navigation, search

Weinan Zhang is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
DFT-extractor: A system to extract domain-specific faceted taxonomies from wikipedia Faceted taxonomy
Network motif
Wikipedia
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Extracting faceted taxonomies from the Web has received increasing attention in recent years from the web mining community. We demonstrate in this study a novel system called DFT-Extractor, which automatically constructs domain-specific faceted taxonomies from Wikipedia in three steps: 1) It crawls domain terms from Wikipedia by using a modified topical crawler. 2) Then it exploits a classification model to extract hyponym relations with the use of motif-based features. 3) Finally, it constructs a faceted taxonomy by applying a community detection algorithm and a group of heuristic rules. DFT-Extractor also provides a graphical user interface to visualize the learned hyponym relations and the tree structure of taxonomies. 0 0
A semantic approach to recommending text advertisements for images Crossmedia mining
Semantic matching
Visual contextual advertising
RecSys'12 - Proceedings of the 6th ACM Conference on Recommender Systems English 2012 In recent years, more and more images have been uploaded and published on the Web. Along with text Web pages, images have been becoming important media to place relevant advertisements. Visual contextual advertising, a young research area, refers to finding relevant text advertisements for a target image without any textual information (e.g., tags). There are two existing approaches, advertisement search based on image annotation, and more recently, advertisement matching based on feature translation between images and texts. However, the state of the art fails to achieve satisfactory results due to the fact that recommended advertisements are syntactically matched but semantically mismatched. In this paper, we propose a semantic approach to improving the performance of visual contextual advertising. More specifically, we exploit a large high-quality image knowledge base (ImageNet) and a widely-used text knowledge base (Wikipedia) to build a bridge between target images and advertisements. The image-advertisement match is built by mapping images and advertisements into the respective knowledge bases and then finding semantic matches between the two knowledge bases. The experimental results show that semantic match outperforms syntactic match significantly using test images from Flickr. We also show that our approach gives a large improvement of 16.4% on the precision of the top 10 matches over previous work, with more semantically relevant advertisements recommended. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM). 0 0
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia Contextual advertising
Wikipedia
Advertising keywords recommendation
Topic-sensitive PageRank
ACM Trans. Intell. Syst. Technol. English 2012 0 0
MOTIF-RE: Motif-based hypernym/hyponym relation extraction from wikipedia links Hypernym/hyponym relation
Network motif
Wikipedia link
Lecture Notes in Computer Science English 2012 Hypernym/hyponym relation extraction plays an essential role in taxonomy learning. The conventional methods based on lexico-syntactic patterns or machine learning usually make use of content-related features. In this paper, we find that the proportions of hyperlinks with different semantic type vary markedly in different network motifs. Based on this observation, we propose MOTIF-RE, an algorithm of extracting hypernym/hyponym relation from Wikipedia hyperlinks. The extraction process consists of three steps: 1) Build a directed graph from a set of domain-specific Wikipedia articles. 2) Count the occurrences of hyperlinks in every three-node network motif and create a feature vector for every hyperlink. 3) Train a classifier to identify semantic relation of hyperlinks. We created three domain-specific Wikipedia article sets to test MOTIF-RE. Experiments on individual dataset show that MOTIF-RE outperforms the baseline algorithm by about 30% in terms of F1-measure. Cross-domain experimental results show similar, which proves that MOTIF-RE has fairly good domain adaptation ability. 0 0
Leveraging social networks to detect anomalous insider actions in collaborative environments Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011 English 2011 Collaborative information systems (CIS) enable users to coordinate efficiently over shared tasks. They are often deployed in complex dynamic systems that provide users with broad access privileges, but also leave the system vulnerable to various attacks. Techniques to detect threats originating from beyond the system are relatively mature, but methods to detect insider threats are still evolving. A promising class of insider threat detection models for CIS focus on the communities that manifest between users based on the usage of common subjects in the system. However, current methods detect only when a user's aggregate behavior is intruding, not when specific actions have deviated from expectation. In this paper, we introduce a method called specialized network anomaly detection (SNAD) to detect such events. SNAD assembles the community of users that access a particular subject and assesses if similarities of the community with and without a certain user are sufficiently different. We present a theoretical basis and perform an extensive empirical evaluation with the access logs of two distinct environments: those of a large electronic health record system (6,015 users, 130,457 patients and 1,327,500 accesses) and the editing logs of Wikipedia (2,388,955 revisors, 55,200 articles and 6,482,780 revisions). We compare SNAD with several competing methods and demonstrate it is significantly more effective: on average it achieves 20-30% greater area under an ROC curve. 0 0
Entity linking leveraging automatically generated annotation Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 Entity linking refers entity mentions in a document to their representations in a knowledge base (KB). In this paper, we propose to use additional information sources from Wikipedia to find more name variations for entity linking task. In addition, as manually creating a training corpus for entity linking is laborintensive and costly, we present a novel method to automatically generate a large scale corpus annotation for ambiguous mentions leveraging on their unambiguous synonyms in the document collection. Then, a binary classifier is trained to filter out KB entities that are not similar to current mentions. This classifier not only can effectively reduce the ambiguities to the existing entities in KB, but also be very useful to highlight the new entities to KB for the further population. Furthermore, we also leverage on the Wikipedia documents to provide additional information which is not available in our generated corpus through a domain adaption approach which provides further performance improvements. The experiment results show that our proposed method outperforms the state-of-the-art approaches. 0 0
UIC at TREC 2006 Blog track NIST Special Publication English 2006 We developed a two-step approach that finds relevant blog documents containing opinioned content for a given query topic. The first step, retrieval step, is to find documents relevant to the query. The second step, opinion identification step, is to find the documents containing opinions within the scope of the document set from the retrieval step. In the retrieval step, we try to improve the retrieval effectiveness by retrieving based on concepts, and doing query expansion using pseudo feedback, Wikipedia feedback and web feedback. In the opinion identification step, we train a sentence classifier using subjective sentences (opinioned) and objective sentences (non-opinioned), which are relevant to a query topic. This classifier labels each sentence in a given document as either subjective or objective. A document containing subjective sentences relating to the query is finally labeled as an opinioned relevant document (ORD). We tried two strategies to rank the ORDs that became two submitted runs. 0 0