Computer guesses age and gender of tweeters
TweetGenie is at least as accurate as people
13 May 2013
Researchers at the Meertens Institute and the University of Twente are today launching TweetGenie, a computer program that can guess the age and gender of Dutch tweeters based on their use of language. The program can correctly guess a tweeter’s gender in 85 percent of cases. When it comes to estimating a tweeter’s age, the computer is accurate to within less than four years, on average. This means that the computer’s estimate is already slightly more accurate than a human’s.
Everyone tweets in their own particular way. However, groups do exhibit specific patterns. Research carried out by the Meertens Institute and the University of Twente has shown that young people on Twitter more often tweet about themselves and about subjects like school. They also use more smiley faces than older tweeters. The latter, in turn, use longer words and longer sentences, for example. They also tend to include links and hashtags in their tweets.
The researchers have now developed the first version of a computer program, known as TweetGenie, which can estimate someone´s age and gender fairly accurately on the basis of their tweets. When making an estimate, the program focuses purely on the tweeter´s use of language, not on their name, photograph, or profile.
Better than the computer?
You can enter anyone´s twitter profile in the program (provided that these are individuals who mainly tweet in Dutch). You can also run anonymous twitter profiles through the program to see if you are any better than the computer at estimating the age and gender of randomly selected tweeters. The first version of this program is being launched today. The researchers hope to refine the program still further through user feedback. As yet, it is very difficult to estimate the age of tweeters who are older than 35.
Stories and rumours
The program is part of Dong Nguyen’s PhD research. One of her goals is to find out how stories and rumours spread through social media. In this connection, it is vital to be able to identify different types of users. This newly developed technique may also have potential applications in various areas of marketing.
The program was developed by researchers at the Meertens Institute and at the University of Twente. The requisite linguistic expertise was provided by the Meertens Institute, which studies Dutch language and culture, with a focus on factors that shape everyday life in our society. Researchers from the Department of Human Media Interaction at the University of Twente’s CTIT research institute were responsible for the more technical aspects of this study. The study was partly funded by the Netherlands Organization for Scientific Research (NWO) and the Royal Netherlands Academy of Arts and Sciences (KNAW).
Note to the press
You can find the program at www.tweetgenie.nl. For further details, or an electronic version of the scientific article entitled “How Old Do You Think I Am?: A Study of Language and Age in Twitter”, please contact Joost Bruysters, Science Information Officer at the University of Twente, on +31 (0)6 1048 8228.