Hybrid NLP for Statistics

The idea of using natural language for statistics within Statistics Netherlands is gaining interest. Using text sources could reduce the number of questions people have to fill out or even help substitute entire questionnaires. It can also be used to increase the quality of already present statistics by using the information filled out in forms coupled with free-text feedback.

The difficulty of using natural language processing (NLP) for national statistics bureaus lies in the uncertainties with two main problem areas. One is the bias, and the other is that the model has to be explainable (a trait not found in the majority of deep-learning methods). This project will be a combination of issues that hold back using NLP models for national statistics.

People involved

Paul Keuren