Extraction of Knowledge Models from Digital Textbooks

Textbooks are created, structured and formatted by domain experts with the main purpose to explain knowledge to a novice. An author uses their understanding of the domain when structuring and formatting the content of a textbook to facilitate this explanation. As a result, the formatting and structural elements of a textbook (section headings, table of contents, index) encode these hidden domain semantics.

In this project, we have developed an approach for automated extraction of semantic models from textbooks based on their formatting rules and internal structure. These models are linked across multiple textbooks within the same domain to improve their quality. They are integrated with DBPedia and enriched with additional semantic information. They are trimmed from concepts that do not belong to the target domain of the textbook. All this is done automatically. The resulting models are rich, machine-readable domain-oriented knowledge graphs that can support semantic and adaptive access to the textbook’s sections, pages and external resources.

Illustration of process from textbook to knowledge graph

More information can be found at https://intextbooks.science.uu.nl/ and in the publications listed below.

Related publications

  • Alpizar-Chacon, I., & Sosnovsky, S. (2022). What's in an Index: Extracting Domain-Specific Knowledge Graphs from Textbooks. In Proceedings of The Web Conference 2022 (accepted). New York, NY, USA: ACM Press.
  • Alpizar-Chacon, I., & Sosnovsky, S. (2021). Knowledge models from PDF textbooks. New Review of Hypermedia and Multimedia, 27(1), (1-49).
  • Alpizar-Chacon, I., & Sosnovsky, S. (2020). Order out of Chaos: Construction of Knowledge Models from PDF Textbooks. In Proceedings of DocEng'2020: The 20th ACM Symposium on Document Engineering, (Article No.: 8, pp 1–10). New York, NY, USA: ACM Press.
  • Alpizar-Chacon, I., & Sosnovsky, S. (2019). Expanding the Web of Knowledge: One Textbook at a Time. In Proceedings of ACM Hypertext'2019: 30th International Conference on Hypertext and Social Media (pp. 9-18). New York, NY, USA: ACM Press.