The goal of this project is to increase the performance of active learning for screening large amounts of textual data by optimizing the hyperparameters of learning algorithms in the ASReview open-source software. Users from social sciences should be able to select a set of hyperparameters optimized for textual data from their domain instead of the currently implemented values obtained from medical datasets.
To provide an example, in 2020, at Utrecht University, researchers screened 392,437 abstracts, of which only ~2% were relevant (source: https://asreview.nl/blog/project/systematic-reviews-uu-umc/). Assuming 40 abstracts per hour, researchers were screening abstracts 9,812 hours. Even if we take the lower performance of ASReview and assume only two researchers screened for relevance, >10,000 hours could have been saved. If we can optimize the model performance even with only a few percent, we can save an enormous amount of work worldwide (and tax money).
To develop a plug-in for the overarching software suite ASReview allowing users to select domain-specific hyperparameters. It should include documentation, vignettes, and instruction materials for less-experienced users.