Project funded: responsible use of free text in medical prediction research

With recent progress in artificial intelligence and natural language processing, the number of medical prediction studies using text mining tools to automatically collect study variables and outcomes from free text in medical records is rapidly increasing. However, the potential impact of errors made by text mining tools on subsequent medical study results has received insufficient attention, and preconditions for responsible use of free text in such studies are absent (e.g., minimum text mining quality, reporting, but also interpretation pitfalls, including implications of the fact that absence of information is generally not evidence of absence in textual notes).

As part of the NWA (National Science Agenda) route on responsible access to and use of big data, a team from UMC Utrecht, Utrecht University, and the University of California San Francisco received funding to study in what ways erroneous text mining models may induce bias in subsequent medical prediction studies, and aim to determine preconditions and recommendations for responsible conduct, reporting, and interpretation of prediction research using variables automatically collected from free text.

Would you like to know more about this project? Please contact Artuur Leeuwenberg (

Research team: Artuur Leeuwenberg (project leader, UMCU), Ewoud Schuit (UMCU), Hans Reitsma (UMCU), Madhumita Sushil (UCSF), Laura Boeschoten (UU), and Ayoub Bagheri (UU).