Research at the University of Twente: Wikipedia article readability too low

Extensive research into readability of millions of Wikipedia articles in English

3 September 2012

Research conducted by the University of Twente and the company Babbletics indicates that the readability of the bulk of English-language Wikipedia articles is too low. This is curious, since the popular online encyclopaedia’s objective is to make knowledge available to all. The UT study shows that 73.5 percent of the articles are 'fairly difficult' to 'very difficult' to read. A full 45 percent of the articles even fall into the categories of 'difficult' and 'very difficult'. The results will be published today in the scientific journal First Monday.

In just a few years’ time, Wikipedia has grown into a frequently consulted resource. Its professed goal is to make human knowledge accessible to everyone. Nowadays, the online encyclopaedia is generally accepted as a relatively reliable source of information. But according to Teun Lucassen, researcher with the CTIT research institute at the University of Twente, the readability of the articles does not receive sufficient emphasis. In an extensive study, UT and Babbletics investigated the readability of virtually all English-language Wikipedia articles. They used the Flesch Reading Ease test, a widely used formula that estimates a text’s readability based on average sentence and word length. Texts with many long sentences and long words are considered to be difficult to read. A score between 60 and 70 is seen as standard, a score between 50 and 60 as moderately difficult and a score below 50 as difficult to very difficult.

The study reveals that the average score of a Wikipedia article is 51.2. Nearly three quarters (73.5%) of the articles score below 60 ('fairly difficult' to 'very difficult'). A full 45% of the articles score below 50, making them ‘difficult’ to 'very difficult’ to read.

‘Simple Wikipedia’:

In 2003, Wikipedia introduced a Simple English version of the online encyclopaedia, specifically to address the readability problem. This version is intended for children, adults with learning disabilities and people who are learning English, among other target groups. The study also shows that this version of Wikipedia leaves much to be desired when it comes to readability. As many as 42.3 percent of the articles scored below 60 on the Flesch Reading Ease test, while a score of 80 would be expected, considering the target group. Furthermore, the study also shows that the readability of the ‘Simple English’ version has been declining steadily for years.

Recommendations

According to Lucassen, Wikipedia is a valuable resource, but article readability has not been sufficiently emphasized. “If Wikipedia were to give authors advice when writing, editing or posting articles, warning them of sentences or words that are too long, then the resource’s readability could be improved relatively easily.”

Research

The study was conducted by Teun Lucassen and Jan Maarten Schraagen of the Department of Cognitive Psychology and Ergonomics (CTIT, University of Twente) and by Roald Dijkstra of Babbletics, a company that specializes in ‘interaction design’. In the study, they used the Flesch Reading Ease test to assess millions of English-language Wikipedia articles of more than five sentences. Their method is demonstrated on the website www.readabilityofwikipedia.com, where you can easily test the readability of individual Wikipedia articles or of your own texts.

Note to the press

For further details, or an electronic version of the article Readability of Wikipedia, please contact the University of Twente Science Information Officer, Joost Bruysters, +31 (0)53-489 2773 / +31 (0)6-104 88 228.