Text Analytics

The research theme Text Analytics investigates utility determinants of natural language processing systems in daily practices from a people-process-technology perspective using an action research approach. It aims to uncover and relate meaningful information within diverse unstructured and semi-structured textual data sources within complex systems such as Health, Fisheries and Business (e.g. Spruit & Cepoi, 2015). Ever since the 90s it has been well-known that unstructured and semi-structured data constitutes around 80% of an organisation’s data volume (Shilakes & Tylman, 1998), however, natural language processing is still considered a wicked problem. One interesting implication of this incomplete information availability is its negative impact on decision making processes. Therefore, we employ a wide range of text analytics techniques—from Natural Language Processing (NLP)-based to Machine Learning (ML)-based and combinations thereof—to meaningfully structure texts, considering system metrics such as effectiveness, efficiency and usability—in addition to accuracy, precision, and F-score performance measures—of each technique within a specfic application context to determine a system’s societal impact in daily practices (e.g. Spruit and Vlug, 2015). In particular, we strategically aim to understand the optimal balance between applying ensembles of symbolic NLP approaches on the one hand and probabilistic text analytics techniques on the other hand.

Current PhD projects in this research theme are Shaheen Syed’s Text Analytics in 21st Century Fisheries (e.g. Syed et al, 2016), and Noha Seddik Tawfik’s Text Analytics in Life Sciences platform SNP Curator (in prep.). This theme is also actively under investigation by several PhD students: Vincent Menger in relation to his Psychiatry Research Analytics InfraStructurE (PRAISE), Zhengru (Ian) Shen in the context of his STRIP Assistant 3.0, and by Raj Jagesar as part of the BeHapp platform, among others.

Highlighted Papers 
Shilakes, C. and Tylman, J. (1998). Enterprise Information Portals. Merrill Lynch & Co., New York, NY.
Spruit, M., and Cepoi, A. (2015). CIRA: A competitive intelligence reference architecture for dynamic solutions. Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (pp. 249–258). KDIR 2015, November 12-14, Lisbon, Portugal: ScitePress.
Spruit, M., and Vlug, B. (2015). Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets. International Journal of Strategic Decision Sciences, 6(3), 1–17.
Syed, S., Spruit, M., and Borit, M. (2016). Bootstrapping a Semantic Lexicon on Verb Similarities. In Fred, A. et al. (Eds.), Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (pp. 189–196). KDIR 2016, November 11-13, 2016, Porto, Portugal: ScitePress.