Digital corpora

If you are looking for digitized resources, the university library is a good place to start. We have licenses on a large number of e-books and digital text corpora, but also offer access to platforms with which these (and other) digital text corpora can be searched. This also applies to a wide range of (audio) visual corpora.

On the Humanities search systems page you can find an overview of the digital text and (audio) visual corpora for Digital Humanities research.

  • Use the text mining tag to filter the files available for text mining.
  • To find (audio) visual files, use the select type filter and choose sound & vision.

Raw data

For a number of text corpora, the library has the raw data available via Yoda. You can then query the raw data using your own tools.
When using these data, please take into account the privacy statement. Only Utrecht University staff may use the data. Request access via Yoda at bibliotheek@uu.nl.

From the following files we have raw data available via Yoda:

  • Eighteenth Century Collections Online
  • Guardian & Observer (1791-1909 and 1910-2003)
  • Nineteenth Century U.K. Periodicals, Module 1
  • Times Digital Archives (1785-2011)
  • Times Literary Supplement (1902-2014)

Do you want access to the raw data we already have, are you interested in the data of e-books or other text files? Then please contact us.

We can always obtain the raw data for the files listed below, but they are not yet in our possession:

  • Early English Books Online (1473-1700)
  • The Economist Historial Archive (1843-2015)
  • Entertainment Industry Magazine Archive
  • International Herald Tribune Historical Archive (1887-2013)
  • Nineteenth-century U.S. Newspapers