Denis Turdakov

From WikiPapers
Jump to: navigation, search

Denis Turdakov is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A category-driven approach to deriving domain specific subset of Wikipedia CEUR Workshop Proceedings English 2011 While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving high-quality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing framework has been reduced by more than order, while the precision of disambiguating musical metadata (ID3 tags) has decreased from 98% to 64%. 0 0
Blognoon: Exploring a topic in the blogosphere Blogs
Concept search
Semantic search
Wikipedia
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 English 2011 We demonstrate Blognoon, a semantic blog search engine with the focus on topic exploration and navigation. Blognoon provides concept search instead of traditional keywords search and improves ranking by identifying main topics of posts. It enhances navigation over the Blogosphere with faceted interfaces and recommendations. 0 0
Accuracy estimate and optimization techniques for SimRank computation VLDB Journal 2010 The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. {SimRank} is a simple and intuitive measure of this kind, based on a graph-theoretic model. {SimRank} is typically computed iteratively, in the spirit of {PageRank.} However, existing work on {SimRank} lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper, we present a technique to estimate the accuracy of computing {SimRank} iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing {SimRank.} We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n4) in the worst case to {min(O(nl),} O(n3/ log2n)), with n denoting the number of objects, and l denoting the number object-to-object relationships. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques, we computed {SimRank} scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links. {Springer-Verlag} 2009. 0 0
HMM expanded to multiple interleaved chains as a model for Word Sense Disambiguation PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation English 2009 The paper proposes a method for Word Sense Disambiguation based on an expanded Hidden Markov Model. The method is based on our observation that natural language text typically traces multiple interleaved chains consisting of semantically related terms. The observation confirms that the classical HMM is too restricted for the WSD task. We thus propose the expansion of HMM to support multiple interleaved chains. The paper presents an algorithm for computing the most probable sequence of meanings for terms in text and proposes a technique for estimating parameters of the model with the aid of structure and content of Wikipedia. Experiments indicate that the presented method produces systematically better WSD results than the existing state-of-the-art knowledge-based WSD methods. 0 0
Accuracy estimate and optimization techniques for simrank computation Proceedings of the VLDB Endowment English 2008 The measure of similarity between objects is a very useful tool in many areas of computer science, including informa-tion retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model. SimRank is typ-ically computed iteratively, in the spirit of PageRank. How-ever, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper we present a technique to estimate the ac-curacy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the com-putational complexity of the iterative algorithm from O(n 4) to O(n 3) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links. 0 0
Efficient Ranking and Computation of Semantic Relatedness and its Application to Word Sense Disambiguation English 2008 0 0
Harnessing Wikipedia for Smart Tags Clustering English 2008 0 0
Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation CEUR Workshop Proceedings English 2008 Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice's measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy. 0 1
Recommender System Based on User-generated Content English 2007 0 0