For her doctoral research, Suzanne Kleijn (Language and communication) developed a new readability formula to measure text difficulty. The new readability formula U-Read provides a 20% better readability prediction compared to popular Dutch readability formulae. The defence will take place on 6 April in Utrecht University Hall.
Is my text comprehensible for my audience? It is a question that readability formulae proclaim to solve. With a press of a button the readability of a text is assessed and users know whether texts are suited for their intended readers. Despite a steady stream of criticism, the need for objective measures of readability has only increased. Fortunately, developments in computational linguistics have opened up new possibilities to improve the old readability formulae. In her dissertation, Suzanne Kleijn combined current language technology with insights from readability research and discourse processing in to build an empirically validated readability formula for Dutch secondary school readers: U-Read.
Kleijn investigated the relationship between linguistic features and two aspects of readability: comprehension and processing ease. Comprehension was measured using an especially developed cloze procedure (‘The HyTeC-cloze’) and processing ease was measured using eye-movement registration. Readability differences between texts and differences between stylistic variants of the same text were studied at the same time. In three separate experiments only the lexical complexity, the syntactic complexity or the number of coherence markers within texts was changed to see how these factors affect readability.
While reducing a text’s lexical complexity or syntactic complexity improved text comprehension and increased processing ease, coherence markers showed mixed results. Adding contrastive connectives (e.g., maar ‘but’) or causal connectives (e.g., dus ‘so’) had a positive effect on comprehension of their immediate context, but inserting additive connectives (e.g., daarnaast ‘furthermore’) had a negative effect on comprehension.
The formula U-Read is based on the combined text comprehension data of the three experiments, which includes comprehension scores for 120 texts and 2900+ Dutch secondary school readers. Using five linguistic features, the new readability formula U-Read provides a 20% better readability prediction compared to popular Dutch readability formulae, the Flesch-Douma and CLIB-formula. Although these features are good predictors of text difficulty when comparing different texts, they overestimate how much the readability of a specific text will improve when the complexity associated with one of these features is reduced. This is due to the fact that texts differ in content and style. When text content is kept the same, the effects of changing a linguistic feature are relatively small compared to the effects predicted on the basis of between-text differences.