22 February 2019 from 15:00 to 17:00

DSCC Central Topic Seminar #8: Machine Learning Applications in Data Management

The Data Science & Complexity Centre (DSCC) Central Topic Seminars are a series of seminars co-organized by the Utrecht Applied Data Science, the Utrecht Bioinformatics Center, and the Centre for Complex Systems Studies. It will consist of tutorials, excursions, software training and specialist lectures. We aim to expose the central topic "Machine Learning" for the broad community within and outside Utrecht University and bring the researchers from different backgrounds together.

In the first hour, Dr. Gian Michele Innocenti from the European Organization for Nuclear Research (CERN) will give a specialist lecture on Machine Learning Applications in Data Management titled "Machine learning techniques in High Energy Physics".

Abstract: The goal of high energy physics is to investigate the properties of the most fundamental forces of nature in large particle accelerators like the Large Hadron Collider (LHC) at CERN. Processing and analyzing hundreds of petabytes of data produced in these experiments represents one of the most extreme technological challenges in the field of data science. Modern techniques of machine learning are successfully applied to study the complex system of particles created in hadronic collisions and to isolate rare signals in an environment characterized by large background contaminations. In this seminar, an overview of the most advanced techniques of machine learning and deep neural networks used in high energy physics will be given, with a specific focus on the tools developed for the study of heavy-ion collisions at the LHC.

In the second hour, Dr. Miel Hostens from Veterinary Medicine will give a specialist lecture on Machine Learning Applications in Data Management titled "Prediction of metabolic clusters in early-lactation dairy cows using machine learning models based on milk biomarkers".


WHY - As a major livestock producer, the European Union is directly affected by the global need for more sustainable food production. Climate change will undoubtedly impact on farm animal production but the health and welfare of livestock is also of increasing public concern. Due to rapid development of precision livestock farming technologies and availability of high-throughput from milk sensors, large-scale massive data has become available on research farms. The preferred matrix to measure the biomarkers is milk, as it is more accessible than blood and allows low-cost, automated repeat sampling using ‘in-line’ sampling and analytical technologies.

WHAT - Certain biomarkers in milk such as N-glycan structures (BM-1), metabolites (BM-2) or mid-infra-red spectra (BM-3) can serve as biomarkers to predict production efficiency and disease. Data mining and machine learning can unlock insights around such biomarkers. As more of the aforementioned types of datasets become available over the near future, scalable data mining and prediction pipelines applied to animals science are needed.

TAKEAWAYS - In this session you will learn: 

  • The methodology for ranking multiple biomarkers according to their predictive power; 
  • Data processing and statistical modelling performed using Spark v2.3.0 with scala API; 
  • Infrastructure, configuration, and implementation of the data pipeline using sliding windows with Apache Spark’s MLlib.

Both students and staff are welcome. There will be drinks in-between and afterwards.

DSCC Central Topic Seminars:

Some of the previous seminars are available to watch via the CCSS YouTube channel.

Venue: Room 2.01, Minnaert Building, Leuvenlaan 4, De Uithof, Utrecht.

Please register before Thursday 21 February 2019.

Start date and time
22 February 2019 15:00
End date and time
22 February 2019 17:00