How much is said in a tweet? A multilingual, information-theoretic perspective

From WikiPapers
Jump to: navigation, search

How much is said in a tweet? A multilingual, information-theoretic perspective is a 2013 conference paper written in English by Neubig G., Duh K. and published in AAAI Spring Symposium - Technical Report.

[edit] Abstract

This paper describes a multilingual study on how much information is contained in a single post of microblog text from Twitter in 26 different languages. In order to answer this question in a quantitative fashion, we take an information-theoretic approach, using entropy as our criterion for quantifying "how much is said" in a tweet. Our results find that, as expected, languages with larger character sets such as Chinese and Japanese contain more information per character than other languages. However, we also find that, somewhat surprisingly, information per character does not have a strong correlation with information per microblog post, as authors of microblog posts in languages with more information per character do not necessarily use all of the space allotted to them. Finally, we examine the relative importance of a number of factors that contribute to whether a language has more or less information content in each character or post, and also compare the information content of microblog text with more traditional text from Wikipedia. Copyright

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.