10 May 2019 from 12:30 to 17:30

Joining forces in Utrecht for data science, complex systems, and other research areas and application domains that are driven by data

Data Science & Complexity Center Summit

© iStockphoto.com

The three focus areas Applied data Science, Bioinformatics and Foundations of Complex Systems invite everyone interested in Data Science, Complexity and related areas to a summit organized by the Data Science & Complexity Center (DSCC).

A first joint DSCC initiative was launched in the form of a seminar series on Machine Learning (November 2018 – April 2019). The summit on 10 May 2019 will conclude this successful seminar series and also marks the start of the co-operation between the three focus areas and the envisaged collaboration with other initiatives in Utrecht in the wider field of data science and beyond.

Programme

The summit programme will feature four invited lectures on the application of machine learning in a range of domains. In addition there will be a breakout session that aims to create a setting for people from different disciplines to get to know each other and their research interest, and to get a taste of how new models for scholarly and scientific work can stimulate interdisciplinary dialogue and offer a powerful instrument for tackling the challenges facing society today.

Schedule
12:30

Welcome with coffee, tea and sandwiches

13:00

Opening

13:20

Lecture Els Stronks: Data Challenge: finding the author of the Dutch national anthem

13:50

Lecture Bas Dutilh: Global phylogeography and ancient evolution of the widespread human gut virus crAssphage

14:30

Break / Bazaar

15:25

Lecture Dong Nguyen: Doing things with words: Large-scale analysis of language in social media

16:00

Lecture Jeannot Trampert: Machine learning for geophysical inference problems

16:30

Wrap-up

16:45

Drinks

Keynote Speakers
Prof. dr. Els Stonks. Foto: Jos Uljee, KB
Prof. Els Stronks

Prof. Els Stronks

In her lecture Data Challenge: finding the author of the Dutch national anthem, she tells about using computational stylometric research to find the author of the Wilhelmus.

Els Stronks is Professor of Early Modern Dutch Literature and Culture, Utrecht University. 

Read the abstract

The Dutch national anthem, Wilhelmus, dates from the late sixteenth century and is the oldest national anthem in the world. The song stands out among other national anthems by its content. It is not about a country, voicing the thoughts and ideas of ‘us’ and ‘we’. Instead it is written in the first person. It voices William of Orange, who invokes a Dutch Revolt against the Spanish rule in the 16th century.

The song is surrounded by mysteries. For centuries, researchers have tried to establish when it was written, if someone commissioned it, and who wrote it. In recent years, the debate about the authorship question has gained impetus thanks to computational stylometric research.

Historically, a handful of authors have been identified as candidates for the authorship of the Wilhelmus: for example Fruytiers, Houwaert, Coornhert and Marnix van St. Aldegonde. Traditionally, most fingers have been pointed to Marnix van St. Aldegonde, who was a writer, but also a diplomat and confident of William of Orange. Recent computational stylometric research resulted in another suspect: Petrus Datheen, an author who was only added to the test more as a control than as a plausible contender. Datheen, a Calvinist theologian and writer, had never been really considered to have written Wilhelmus, probably because of his image as a not so skilled poet and because of a fall out he had with Willem of Orange in 1578.

This stylometric result raises a new and exciting question: is there, among all the persons who traditionally were not thought of as potential Wilhelmus authors, someone with an even higher stylometric score than Petrus Datheen? Is there someone who is still overlooked? And also, is there another computational method than stylometry more suitable in finding the author of the Wilhelmus?

Bas Dulith
Dr Bas Dutilh

Dr Bas Dulith

In his lecture Global phylogeography and ancient evolution of the widespread human gut virus crAssphage, he talks about the discovery of a virus by using innovative big data analysis.

Bas Dutilh is Assistant Professor of Theoretical Biology and Bioinformatics, Utrecht University.

Read the Abstract

Microbiomes are vast communities of microbes and viruses that populate all natural ecosystems. Viruses are rapidly evolving biological entities and have been considered the most variable component of microbiomes. However, recent evidence suggests that the viruses in the human gut are remarkably stable compared to other environments.

By using innovative big data analysis, we discovered one of the most abundant and widespread human gut viruses, crAssphage, and predicted it's Bacteroides host. Next, through a global collaboratory, we obtained DNA sequences of crAssphage from over one-third of the world’s countries, and investigated its origin, evolution, and global epidemiological signature.

We showed that its phylogeography is locally clustered within countries, cities, and individuals. We also found structurally conserved crAssphage-like genomes in non-human primates including apes, Old-World monkeys, and New-World monkeys, challenging rampant viral genomic mosaicism and suggesting that the association of crAssphage with primates may be millions of years old. We conclude that crAssphage is a benign globetrotter virus that may have co-evolved with the human lineage and an integral part of the normal human gut virome.

Dong Nguyen
Dr Dong Nguyen

Dr Dong Nguyen

In her lecture Doing things with words: Large-scale analysis of language in social media, she will highlight the pros and cons of using social media data, and presents a recent study.

Dong Nguyen is research fellow at the Alan Turing Institute in London and affiliated with the Institute for Language, Cognition and Computation, University of Edinburgh. As of May 2019 she will join the UU Department of Information and Computing Sciences.

Read the Abstract

Social media provides the opportunity to study language use in a variety of social situations on a very large scale. However, language in social media is substantially different from many corpora that are used to develop tools for natural language processing. To fully leverage the potential of social media data, new computational approaches are needed. 

In this talk, I will first highlight both the opportunities and difficulties that arise when working with social media data. Next, I will present a recent study in which we track changes in the meanings of words in a Twitter corpus spanning 5.5 years.

Jeannot Trampert
Prof. Jeannot Trampert

Prof. Jeannot Trampert

In his talk Machine learning for geophysical inference problems, he will share how machine learning can be used for problems ranging from mapping Earth's internal discontinuities to earthquake early warning.

Jeannot Trampert is Professor of Seismology at Utrecht University.

Read the abstract

All earth related data are recorded at or above the surface. Geophysics aims to relate these data to Earth's internal structure or processes, most often trying to infer model parameters using wave or Navier-Stokes equations. Since uncertainty is important, most inference problems are solved in a Bayesian framework. Over the last 10 years, I explored how far machine learning can be used for various problems.

All model parameters are non-linearly related to the observations. Therefore, the most general way to implement the Bayesian framework is to use some form of Monte Carlo sampling. I will introduce the concept of prior sampling, a Bayesian framework based on mixture density networks. I will show how these particular neural networks can be used to solve problems ranging from mapping Earth's internal discontinuities to earthquake early warning.   

Data Science & Complexity Center (DSCC)

The Data Science & Complexity Center (DSCC) is a collaboration of the various initiatives within Utrecht University (UU) in the field of data research and complexity, including the focus areas Applied data Science, Bioinformatics and Foundations of Complex Systems. DSCC establishes cross connections, strengthens the potential for multidisciplinary collaboration, and ensures coherence in the range of education that UU offers in the field of the disciplines involved.

Start date and time
10 May 2019 12:30
End date and time
10 May 2019 17:30
Entrance fee
Free
More information
dscc@uu.nl