Data Science & Artificial Intelligence for Society Day
All researchers and master students at Utrecht University are invited to the Data Science & Artificial Intelligence (AI) for Society Day on June 30th. The day is hosted by the three focus areas Applied Data Science, Governing the Digital Society and Human-centered Artificial Intelligence in partnership with the FAIR Research IT Programme. The goal is to provide all researchers connected to the three thematically close focus areas as well as to Fair Research IT an opportunity to connect, discuss and exchange thoughts and ideas.

Why you should come
At the Data Science & AI for Society Day you will get insights into the research and collaborations of your colleagues, which will inspire you for your own research. You will also learn more about the three focus areas and you will be able to explore the options for cooperation, support, and funding. The focus areas have calls for proposals to be awarded seed money funding for new research initiatives every year.
If you are already involved in a focus area, you will have the opportunity to see what the other focus areas have to offer and you can explore whether affiliation with another focus area would be valuable. Or you will be inspired to collaborate with colleagues from another focus area.
The conference will start at 1 pm (walk-in lunch starts at 12 noon) and takes place in the Marinus Ruppert building. It will include plenary sessions (in the lecture hall ‘Ruppert Wit’) and breakout sessions (in various tutorial rooms in the Ruppert building).
Programme
You will be welcomed by José van Dijk and Henk Kummeling. During the first plenary session Nadya Purtova and Joeri Zwerts will show you what data science and artificial intelligence mean to their research.
After this part of the programme, several break-out sessions will be organised. At 3 pm the second plenary session will be held by Herbert Hoijtink and Natasha Alechina
After another round of break-out sessions, Jurgen Moers will wrap up the conference and Jan Broersen and Mehdi Dastani will sketch future perspectives.
Download the programme overview
Lunch
12.00-13.00 Walk-in lunch
Plenary session #1
13.00-13.05 Welcome – plenary opening by José van Dijck
13.05-13.15 Henk Kummeling
13.15-13.30 Plenary speaker #1: Nadya Purtova
13.30-13.45 Plenary speaker #2: Joeri Zwerts
13-45-13.50 Short break, moving to breakout sessions
Breakout sessions
13.50-14.15 Breakout sessions #1
14.15-14.20 Short break; breakout session switch
14.20-14.45 Breakout sessions #2
14.45-15.00 Break, moving to plenary session
Plenary session #2
15.00-15.15 Plenary Speaker #3:Natasha Alechina
15.15-15.30 Plenary speaker #4: Herbert Hoijtink
15.30-15.35 Short break, moving to breakout sessions
Breakout sessions
15.35-16.00 Breakout sessions #3
16.00-16.05 Short break; breakout session switch
16.05-16.30 Breakout sessions #4
16.30-16.35 Short break, moving to plenary closure
Plenary closure
16.35-16.45 Jurgen Moers – wrap-up conference
16.45-16.55 Jan Broersen and Mehdi Dastani – conference closure, sketch future perspectives
Drinks
Research for Governing the Digital Society. A stakeholder-oriented approach of investigation and intervention
Mirko Tobias Schäfer (Utrecht Data School)
Ruppert 029
Utrecht Data School is a teaching and research platform at Utrecht University (UDS). Committed to socially engaged research and education, UDS works at the cross-section of different disciplines and between academia and societal sectors. In this presentation, Mirko Tobias Schäfer speaks about developing socially engaged research projects and to facilitate effective knowledge transfer. While this practice of direct engagement provides great insight into the impact of datafication and algorithmisation, it also calls for revisiting the role of researchers and university in the knowledge economy.
Note that this session replaces the session Comparing Key Concepts: ‘Algorithmic Condition’ and ‘Digital Society’.
Employing text classification models for journalistic inquiry on public debates
Joris Verbeek (Text Mining; Applied Data Science)
Ruppert 031
Joris Veerbeek will present his PhD project, which aims to (a) develop and test strategies for using automatic text classification in a journalistic setting, and to (b) use text classification models to study public debates and the dynamics of social media. The project is co-financed by the Dutch journalistic weekly De Groene Amsterdammer.
Reproducible research in R using the Workflow for Open Reproducible Code in Science (WORCS)
Neha Moopen (Open Statistical Software JASP and R; Applied Data Science)
Ruppert 032
One aspect of Open Science is making your research open and reproducible. But how? Best practices for reproducible science include several tools and practices that you may never have used or even heard of before. Are you using version control? How are you managing your dependencies? Is your manuscript a dynamic document? This breakout session introduces attendees to the Workflow for Open Reproducible Code in Science (WORCS), an R package that offers a streamlined workflow to make the entire process of your analysis, from study planning to the submission of your manuscript, reproducible.
Cognitive diagnostic assessment in university statistics education
Lientje Maas (Learning Analytics; Applied Data Science)
Ruppert 033
E-learning is increasingly used to support student learning in higher education, facilitating administration of online formative assessments. Based on students’ item response data from these assessments, diagnostic information can be obtained to provide effective feedback to students via learning dashboards. During this breakout session, it will be discussed how diagnostic classification models can be used to obtain valid and reliable measurements of students’ skill mastery in the domain of statistics education, and how this information can be used to provide tailored support to individual students.
Using active learning to reduce the costs of population-based neuroimaging studies
Hugo Schnack Thomas Kok and Georg Krempl (Machine Learning Applications; Applied Data Science)
Ruppert 111
Finding diagnostic/prognostic neuroimaging biomarkers for mental disorders requires large sample sizes. This means that many healthy individuals and patients need to be scanned, which can be a burden for patients; in addition, the acquisition and processing of MRI brain scans is time-consuming and expensive. In this study we investigate whether acquisition of MRI brain scans could be done more efficiently, i.e., only scanning individuals that are most informative for the identification of biomarkers.
Some data analysis applications comprise datasets, where explanatory variables are expensive or tedious to acquire (e.g., MRI brain scans), but auxiliary data (e.g., demographics) are readily available and might help to construct an insightful training set. In active learning literature, this problem has not yet been studied, despite promising results in related problem settings that concern the selection of instances or instance-feature pairs. Therefore, we formulate this complementary problem of Active Selection of Classification Features (ASCF): Given a primary task, to learn a classification model based on expensive features x, the ASCF task is to use a set of readily available selection variables z to select these instances, that will improve the primary task’s performance most when acquiring their expensive features x and including them to the primary training set.
We propose an approach for this problem and evaluate their performance on public real-world benchmark datasets. In addition, we illustrate the use of this approach to efficiently acquire MRI scans in the context of neuroimaging research on mental disorders, based on a simulated study design with real MRI data.
Introduction SIG Inclusion in the Datafied City
Michiel de Lange (Inclusion in the Datafied City; Governing the Digital Society)
Ruppert 114
Modern cities are datafied cities. The Special Interest Group Inclusion in the Datafied City researches how data can contribute in strengthening civic participation and public values in the smart city. It has been noted that processes of datafication tend to go hand in hand with mechanisms like commodification and (social) selection. Oftentimes, the uses of data tend to promote the specific interests of some stakeholders at the expense of other people, or societal interests at large. Hence, the ongoing datafication of city life poses a range of urgent threats to civic inclusiveness.
How and in what way could online advertisement become more (of a data) common?
Gijs van Maanen, Anne Helmond and Fernando van der Vlist (Part of the seminar series on data commoning; Governing the Digital Society)
Ruppert 029 (Note: this session continues until 16:00)
This presentation considers the significance of data partnerships in the integrated platform ecosystem of social media and digital advertising. More specifically, it explores how partners mediate and shape platform power through infrastructure development for the creation, commodification, analysis, and circulation of data audiences. We argue how partners play a key role in the technological and organisational process of platformization, driving the technological expansion and economic growth of digital platforms into other markets, industries, and societal domains through data partnerships. Partners include leading audience intermediaries that shape the creation, buying, modelling, and targeting of data assets and build the business-to-business relationships that integrate social media with the digital advertising market. We present an empirical study of the partnerships of leading social media to situate them in the audience (data) economy and identify aspects of platform power.
The presentation highlights the strategic importance of partnerships in the processes of assetization and platformization. Ultimately, our contribution situates platforms, their data assets, and their sources of power within an integrated platform ecosystem. How to think about data commons in relation to the digital advertising industry?
Respondent: Natalia Avlona, https://dcode-network.eu
This is a hybrid session and part of a seminar series. It is also possible to join digitally on Microsoft Teams. More information.
Presentation of the Principles by Design SIG and a short introduction to the upcoming ICRES “ Sustainable Ethics and standards in the design and regulation of AI” Conference
Karin van Es, Lucky Belder & Machiko Kanetake (Principles by design; Governing the Digital Society)
Ruppert 031
Data and algorithms are the result of human decisions and values, but also the prejudices that underlie them. Given that trend, there is a need to develop good data practices. The Special Interest Group Principles by Design deals with questions concerning 'good data' practices. In the presentation we present the topics currently pursued the SIG the challenges that value-sensitive data practices can pose to EU-based institutions’ openness in international research and innovation designing value-driven recommendations opposition between big data and human expertise in data discourse We will provide a short introduction to the upcoming ICRES “Sustainable Ethics and standards in the design and regulation of AI” Conference we are hosting in 2023.
Embodied AI: Virtual Humans and Social Robots
Zerrin Yumak, Ruud Hortenius, Maartje de Graaf (Autonomous Intelligent Systems; Human-centered Artificial Intelligence)
Ruppert 032
The main objective of Embodied AI initiative is to unite researchers of Utrecht University in the field of Embodied AI: virtual humans and social robots that can engage in face-to-face social interactions with people using verbal and non-verbal behaviours. With this initiative, we aim to increase visibility, build community, and foster interdisciplinary collaborations as well as encourage diversity and inclusion of perspectives and backgrounds. In this session, you will get an overview of the Embodied AI initiative, have an interactive discussion on the challenges and opportunities of this research field, and learn how this initiative could enhance and inspire your research.
Explainable artificial intelligence in medical image analysis
Bas van der Velden and ADS seed grant recipients (Imaging; Applied Data Science)
Ruppert 033
SIG Imaging will deliver a keynote presentation on “explainable artificial intelligence in medical image analysis” by Bas van der Velden. Furthermore, multiple ADS seed grant recipients will present how the seed grant influenced their research.
Mine-Well - Using process mining and survey research to study employees’ well-being and performance
Sven Lugtigheid (Data-Driven Work Innovation; Applied Data Science)
Ruppert 111
Work processes are key assets of organizations. To deliver a valuable outcome for customers, people instrument such processes with the support of information systems in organizations. Hence, the well-being of people within organizations is an important factor to which organizations pay special attention. In this collaborative project between the UU Future of Work Hub (https://www.uu.nl/en/research/institutions-for-open-societies/future-of…) and the UU Special Interest Group on Data-driven Work Innovation. UU researchers study the connection between employee well-being and performing processes using the execution data of these processes.
High Performance Computing - transcend the limits of your own computer
Roel Brouwer en Jelle Treep (Fair Research IT)
Ruppert 114
Are you running into limits of the computational power on your own machine? Do you need more (working) memory or GPUs than you currently have available? Do you want to offload work to an external machine? You might want to consider "High Performance Computing".
For the purposes of this workshop, we consider the term "High Performance Computing" (HPC) very broadly: any case where a user will work on a system other than their own computer. A number of scenarios where this may be useful, or even necessary, will be discussed. We will try to provide answers to the following questions:
- When do you need (or want) to use HPC facilities?
- What do you need to make this work?
- Where can you get advice on what to do and support in setting things up?
- What opportunities are there to get funding for HPC projects?
Common patterns of medication trajectories in patients with multiple chronic conditions.
Daniala Weir and David Liang (Applied Data Science)
Ruppert 116
Polypharmacy (use of multiple medications) is common among older adults living with multiple chronic conditions and increases the risk of harmful adverse drug events. There is uncertainty around safe medication prescribing practices for individuals living with multiple chronic conditions because older adults with complex medical conditions are excluded from clinical trials of medications. As a result, clinical guidelines typically focus on the management of a single disease, and do not address how to optimally integrate care for individuals whose multiple conditions may make following guideline recommendations for any single disease harmful. The first step in addressing this significant evidence gap is to better understand and characterize common patterns of multi-drug regimens over time. We will analyze data from the Clinical Practice Research Datalink (CPRD) for this project. The CPRD includes all de-identified electronic health records from the patient population of consenting UK general practices (~60 million patients). Data are collected through the coded primary care record and include demographics, medication prescription details, clinical events (symptoms and diagnoses), clinical lab tests, lifestyle indicators, hospital admissions and major outcomes and details relating to death. In preliminary analyses, we focus on patients with type 2 diabetes and use k-medoids clustering with Dynamic Time Warping (DTW) distance to identify common medication trajectories.
Applied AI in Eye Care
Robert Wisse (remote) (Clinical Data Applications; Applied Data Science)
Ruppert 119
No description available.
Towards a reliable stopping criterion for your review; a comparison of existing methods and future directions
Michiel Bron (Active Learning; Applied Data Science, Human-centered Artificial Intelligence)
Ruppert 031
Technology-Assisted Review systems that use Active Learning (e.g., ASReview) help you identify relevant records in a dataset as early as possible.
However, these systems often struggle with determining the recall (i.e., have all relevant records been found) during the review process. This lack makes it hard to establish when the user can stop the review process. This presentation will discuss the challenge of finding an informative, robust, and efficient stopping criterion and discuss the current state of the art.
Automating Cognitive Mapping Text Analysis: The Opportunities and Limitations of a Machine-learning Approach
Femke van Esch (Text Mining, Applied Data Science)
Ruppert 032
Automatic text analysis has great benefits over manual text-analysis in terms of speed and reliability. However, despite great advances in the domain, the word-based nature of such techniques makes them unable to represent deeper meanings in text. The technique of Cognitive Mapping (CM) is able to deriving deeper meaning representations from texts, like argumentative or causal reasoning patterns, but still relies on manual coding (Bougon et al 1977; Boukes et al forthcoming; Van Esch & De Jong 2019). Rather than word-based, CM analysis focusses on causal and normative relationships between concepts (Axelrod 1976; Young 1996; Van Esch 2007). Whereas this feature enables the deeper analysis of text, it also makes the automation of the technique a challenge. During the first part of our project, we were able to elicit an additional grant from the Netherlands eScience Center and we worked together with their Research Software Engineers to explore the potential of automating the Cognitve Mapping text analysis technique using state-of-the-art machine learning methods. Our findings indicate that the different tasks involved in CM are very difficult machine learning tasks: as of yet the predictions from the machine learning models do not come close to the labels given by the human coder. However, integrating the machine learning with some rule-based elements did show promise. We will therefore explore a more rule-based approach in step two of our project.
Why not teach open software?
Hugo Quené (Open Statistical Software JASP and R, Applied Data Science)
Ruppert 033
One aspect of Open Science is to apply open and transparent statistical analyses, using open-source software such as JASP and R for reproducible analyses. Teaching students how to use such software, in our own classes and courses, is an important step towards that goal. So what is holding us back? What factors are at play in teachers’ choices for open-source or closed-source software in the classroom? In this discussion session we will try to identify these factors, such as e.g. time investment, intellectual satisfaction, familiarity, (conformity to) outside expectations, etc.
AI for an Open Society: Contributions from the Social Sciences and Humanities
Chris Janssen, Leendert van Maanen and Dominik Klein (Social and Cognitive Modeling, AI and Behavioural Change, Knowledge Representation and Reasoning; Human-centered Artificial Intelligence)
Ruppert 111
Artificial intelligence (AI) models, technology, and methods are heavily influencing society. This goes both through informal and formal institutions. For example, for informal institutions, algorithms estimate what people ‘like’ and dynamically adapt what posts they see on social media and even news websites. This can create ‘filter bubbles’ and a societal divide. On the formal side, models are one of the instruments to inform institutional and governmental policy and to create behavior change. For example, cognitive models of human attention can inform policy on how frequently to display speed information above the Dutch Highways, and agent-based models of social interaction can help understand how quickly diseases spread through a network and how this can be slowed down. In essence, AI models impact society at various levels. To shape tomorrow’s society, research needs to investigate how the different types of models work, and how they can impact human behavior, and thereby society. This session is organized by three of the SIG leaders of the HAI special interest area, who also jointly set-up a new BSc course on AI for an Open Society.
AI in Dutch primary classrooms: (Re)valuing Teachers’ Professional Autonomy
Niels Kerssens (Platformisation in education; Governing the Digital Society)
This session is cancelled.
Infection Transmission Ontology: Standardization of Infection Transmission Data
Egil Fischer and Elena Slavco (Fair Research IT)
Ruppert 116
Data of transmission of infections in animals and humans enable the quantification of crucial parameters, like the reproduction number. Datasets on transmission in small groups vary in format, structure, syntax and semantics. It is difficult to combine these heterogeneous datasets, for example, to do re-analysis and meta-analysis.
In this collaborative study involving both domain experts and data scientists, we integrated existing datasets on transmission experiments by using semantic web technologies. We have developed a Infection Transmission Ontology that describes the main concepts and relations in the domain of transmission of microorganisms between hosts. By mapping existing datasets to the concepts in the Infection Transmission Ontology, the datasets are standardized and can be combined. We successfully applied the ontology to four transmission datasets and further analysed them in a meta-analysis.
Publishing personal data
Jacques Flores and Dorien Huijser (RDM Support)
Ruppert 119 (Note: this session continues in break-out session 4 and lasts until 16:30)
This session will introduce you to the General Data Protection Regulation (GDPR) and how it applies to research data. You will learn about available research data management tools and how to comply with privacy regulations.
Machine learning applications to ethics and law
Henry Prakken, Arno Siebes (AI, Ethics and Law; Human-centered Artificial Intelligence)
Ruppert 114
This slot aims to bring together researchers from various disciplines to discuss the prospects for research projects on applying machine learning to ethics or law. Among the possible topics are:
- legal case outcome prediction
- legal text classification
- legal information retrieval
- ethical decision making
- learning ethical values from stories
- machine learning and data protection law
- ethical aspects of machine learning for decision making
An open and transparent benchmark platform for information retrieval via active learning.
Jelle Teijema and Jonathan de Bruin (Fair Research IT/Active Learning; Applied Data Science, Human-centered Artificial Intelligence)
Ruppert 029
The Open Data Systematic review Simulation platform (ODSS) is an open-source tool that uses FAIR data datasets to run ASReview simulations of active learning-based systematic reviews to benchmark classification models, feature extractors, datasets, and other parts of the active learning pipeline. We will discuss the challenges of such a benchmark platform and how to overcome them, what's important in benchmarking models and datasets, what makes an inclusive collection of datasets, how to adhere to FAIR-data principles while performing dataset modifications, and other topics.
Introducing a new SIG: Governing the social media and data economy
Catalina Goanta, Anne Helmond, Nadya Purtova (Governing the social media and data economy, Governing the Digital Society)
Ruppert 031
How can we understand the mechanisms, participants, infrastructures, and governance of the data economy? How are social media platforms and app stores governed in terms of monetization, business models, and user production?
This Special Interest Group (SIG) aims to act as a multidisciplinary platform aimed at shedding light on contemporary social media and data economies by focusing on: social media economics, app store economies, their various monetization models and data infrastructures, influencer and other platform cultures, as well as emerging socio-legal norms, reflected in aspects such as business practices, publicness, technologies, creator perceptions, or content regulation.
Explainable artificial intelligence in medical image analysis
Bas van der Velden and ADS seed grant recipients (Imaging; Applied Data Science)
Ruppert 032
SIG Imaging will deliver a keynote presentation on “explainable artificial intelligence in medical image analysis” by Bas van der Velden. Furthermore, multiple ADS seed grant recipients will present how the seed grant influenced their research.
ML models for clinical practice
O’Jay Medina en Sjoerd de Vries (Clinical Data Applications; Applied Data Science)
Ruppert 033
No description available.
‘Speed date’ for joint GDS-HAI SIG “Government, AI and behavior”
Stephan Grimmelikhuijsen (Government, AI and behavior; Governing the Digital Society, Human-centered Artificial Intelligence)
Ruppert 111
The joint SIG is still in a developmental stage and this new SIG is looking for potential collaboraters/members. I will present the main goals, plans and ambitions of the SIG in 5 minutes and then we discuss possible venues for collaborations with those in the room.
IRAS: Objective measurements of subjective environments – why, what and how?
Anke Huss (Fair Research IT)
Ruppert 114
Applying Machine Learning to enable identification of “true” environments.
AI and the Future Workplace: Computational, Ethical and Regulatory issues
Ioanna Lykourentzou (AI in Cultural Inquiry and Art; Human-centered Artificial Intelligence)
Ruppert 116
As work is becoming more complex and interconnected, algorithms are assuming an increasingly central role in its management. The goal of these algorithms is to track the digital workers' performance and decide where, when, and with whom each person should collaborate, and on which task, in order to optimize the speed, quality, or cost of the final work outcome. This tight algorithmic control can limit the breadth of tasks that can be accomplished, stifle creativity, and leave a disproportionate number of digital workers exposed to possible algorithm biases. Although online work came with the promise of flexibility, empowerment, and personal development opportunities, it risks evolving into an exceptionally confined and isolated environment in which algorithms direct and workers execute. What directions, if any, should AI take and how to influence them for a fairer, more ethical, and more sustainable digital workplace? What is the role of citizens, workers, scholars, regulators, and public bodies in the growing discussion around algorithm transparency, explainability, and contestability applied to the field of online labor?
Publishing personal data
Jacques Flores and Dorien Huijser (RDM Support)
Ruppert 119 (Note: this session starts in break-out session 3 and lasts until 16:30)
This session will introduce you to the General Data Protection Regulation (GDPR) and how it applies to research data. You will learn about available research data management tools and how to comply with privacy regulations.
Registration
Are you interested in one of these subjects and curious what other topics will be discussed during the Data Science & AI for Society Day?
Sign up