How much is said in a tweet? A multilingual, information-theoretic perspective
|How much is said in a tweet? A multilingual, information-theoretic perspective|
|Author(s)||Neubig G., Duh K.|
|Published in||AAAI Spring Symposium - Technical Report|
|Keyword(s)||Unknown (Extra: Information contents, Information-theoretic approach, Micro-blog, Number of factors, Strong correlation, Wikipedia, Character sets, Information theory)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
How much is said in a tweet? A multilingual, information-theoretic perspective is a 2013 conference paper written in English by Neubig G., Duh K. and published in AAAI Spring Symposium - Technical Report.
This paper describes a multilingual study on how much information is contained in a single post of microblog text from Twitter in 26 different languages. In order to answer this question in a quantitative fashion, we take an information-theoretic approach, using entropy as our criterion for quantifying "how much is said" in a tweet. Our results find that, as expected, languages with larger character sets such as Chinese and Japanese contain more information per character than other languages. However, we also find that, somewhat surprisingly, information per character does not have a strong correlation with information per microblog post, as authors of microblog posts in languages with more information per character do not necessarily use all of the space allotted to them. Finally, we examine the relative importance of a number of factors that contribute to whether a language has more or less information content in each character or post, and also compare the information content of microblog text with more traditional text from Wikipedia. Copyright
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.