Courses

This Master's programme starts with two compulsory options which pave the fundamentals in bioinformatics and biocomplexity. As there are no tracks, you can mix and match to create a more bioinformatics or more biocomplexity (modeling) flavour in the electives phase of the Master's. Also the type of internship you choose will determine the specialisation or area of research you most like. Together with the Programme Coordinator, you will determine the most optimal study path.

Compulsory courses 

The compulsory course work of this programme consists of: 

  • Essentials (4.5 EC) In addition one or more courses with a minimum of 5.5 EC should be selected from the elective courses.

 Plus choose one of the compulsory options below, in consultation with the coordinator and based on your prior knowledge:

  • Option 1: Biological Modeling (5.0 EC) 
  • Option 2: Bioinformatics and Genomics (5.0 EC)

See the menu below for all course options and course descriptions. For a description of the full curriculum, please see the Study Programme page.

Compulsory courses (15 EC)

BiBC Essentials (compulsory)

Description of content

In this 6 weeks course the most essential trades for Bioinformatics and BioComplexity will be taught. Short lectures are alternated with hands-on computer practicals and assignments.

Topics that will be covered are: Using the lunix command line and tools.

Building analysis pipelines

Introduction in the many algorithms (and tools) that are used in Bioinformatics and BioComplexity amongst which are: Unix workflows, Version Control, unsupervised learning, parameter fitting, reduction, supervised learning and spatial models

Literature/study material used:

Will be provided as course materials.

Bioinformatics and Genomics (compulsory)

Content:

In this course, attention is paid to understanding and working with large amounts of data as has been obtained in recent years with genetic and molecular research. These technological developments require new skills and concepts to be able to understand and conduct life science research. Successively work with mutations and sequencing data will be used. The regulatory network is studied and how the effect of mutations in proteins can be better explained through evolution. The Python programming language is used throughout the course, for scripting.

Keywords: (cancer) genomics, reguloom, transcriptome, proteome, genome and protein evolution, SNP / CNV calling.

Introduction to (Biological) Modelling (compulsory)

The modeling of real biological systems can aid greatly in the understanding of the behavior of such systems, and in predicting how they will behave under all kinds of circumstances. In this course, we will study how to build models using differential equations, and how to analyze their behavior.
We will use a context of diverse examples from biology, including ecological growth, predator-prey systems, enzyme reactions, genetic regulation, animal coat patterns, and firing neurons.
Models are built from the ground up, using biological knowledge and mathematical tools, enabling the students to gain the experience necessary to build their own models, analyze them, and valuate their worth.
The course runs for 5-6 weeks in period 1, and is a combination of lectures, tutorials, and computer practicals on your own laptop using Mathematica. No previous experience with Mathematica is required (we will start practicals with a Mathematcia-tutorial). The practicals account for 20% of your grade. A final written test at the end of the course accounts for 80%.

Electives

Advanced R for Life Sciences: In-depth Techniques for analysis, visualization and publishing

Many researchers will need to apply statistical analysis in their work. Often, the R statistical language is chosen, since it is well established, free, and has many packages available for different tasks. If you want to be able to use the more powerful features of R, create visually attractive figures with ggplot, write concise and organized code that you can share with others, create automatically generated reports. This course gives you the knowledge to follow one of the subsequent courses of statistical analysis for omics technologies, and linear models with R.

Literature/study material used:
Provided during the course. Students are required to bring a laptop to work online via a web page.

Advanced Omics for Life Sciences

Period: course moved to P3. 4 April - 8 April 2022

Lecturers:

Invited speakers (differs between years), scheduled speakers listed below:
(Cuppen) Francis Blokzijl, UMC Utrecht, 10% (DNA)
(Veldink) Wouter van Rheenen, UMCU / Rudolf Magnus, 10% (RNA)
(Heck) Maarten Altelaar, Utrecht University, 10% (Protein)
(Verhoeven-Duif) Judith Jans, UMC Utrecht, 10% (Metabolic)
 
Course description:
The correct analysis and integration of omics data has become a major component of biomedical research. The advances in technology have allowed for more sophisticated and unbiased approaches to assess the different omics data types. Large collaborative projects combined with databasing efforts have led to invaluable resources like ENCODE [https://www.encodeproject.org/], Expression Atlas [https://www.ebi.ac.uk/gxa/home], the Human Protein Atlas [http://www.proteinatlas.org/] and KEGG [http://www.genome.jp/kegg/]. These resources can provide valuable insights into your omics data and serve as a validation or quality control set when used appropriately. The challenge is to effectively analyze omics data and these large online resources after performing an experiment or getting clinical results.
For example, when analyzing tumors derived from a set of patients, the question is: how to correctly analyze your OMICs data and leverage public data by comparing these against your own data. The Cancer Genome Atlas alone numbers over 50,000 files from 3 different OMICs types. What are the correct and feasible strategies to utilize these data?
In this course a scientist (active within the respective OMICs field) starts the morning with a lecture, the accompanying scientific article will be available for prior reading. The presenter will introduce a recent study performed within their group and outline the data mining and data integration opportunities and issues they encountered. The lecture is followed by a discussion on how to conduct this research and possible approaches to expand on the current work or solve one of the encountered issues. Topics covered will include mutation analysis, expression profiling, protein abundance and metabolic pathways. In the afternoon students will be tasked with finding a solution to a challenge set by the presenter. Solving such problems can only be done through writing (small) computer programs and integrating relevant data sources.
This course is suitable for students who take an interest in informatics and biomedical application of informatics. The course builds on the skills acquired in introduction programming courses; having completed one of these is a hard prerequisite. Following the "Introduction to Bioinformatics for Molecular Biologists" course is highly recommended.
The goal of this course is to outline current omics analyses methods and the challenges and value of integrating public data in life science research. We will discuss state-of-the-art approaches for tackling these challenges. Students from other disciplines and other universities are invited to attend this course. The topic is suitable for all students in the life sciences dealing with OMICs data.

Literature/study material used:
Lectures, Scientific articles, Course laptop (students can bring their own), Online resources and documentation, Online tutorials, Unix operating system, Online discussion and Q&A platform.

Registration:
You can register for this course via Osiris Student. More information about the registration procedure can be found here on the Studyguide.
Maximum capacity is 25 participants.

Mandatory for students in own Master’s programme:
No.

Optional for students in other GSLS Master’s programme:
Yes, especially CSDB and MCLS students.

Prerequisite knowledge:
Introduction to Python/R/ other programming language.

Advanced Bioinformatics: data mining and data integration for life sciences

Period: Moved to Period 3. 21 March - 25 March 2022.

Faculty
Adrien Melquiond, dLAB, CMM/Genetics (course coordinator: a.s.j.melquiond@umcutrecht.nl)
Pjotr Prins, Biomedical genetics/Genetics

Invited speakers are different each year.Description of content
Effective mining of data and integrating data is one of the major challenges in biomedical research. Decennia of research have led to an accumulation of databases world-wide, including important resources, such as NCBI, KEGG, ENCODE, SWISS-PROT etc. Lately, new data acquisition technologies, especially next generation sequencing (NGS), are rapidly increasing the amount of information available online, from data published with papers all the way to large scale collaborations, such as The Genome Cancer Atlas (TCGA) involving a wide range of hospitals and research groups offering information on patients, diagnostics, treatments together with data on sequenced tumors, gene expression, methylation, etc.

The challenge is to effectively mine resources, such as the TCGA, after performing an experiment or getting clinical results. For example, if you are sequencing cancer tumors of patients, the question is: how to mine this public data and compare the results against your own data and results. TCGA alone numbers over 50,000 files, there is no way to mine this data by hand. Likewise we have access to 1,000 public genomes and the genome of the Netherlands (GoNL). What are feasible strategies for using this data?

In this course the morning is started with a lecture by a leading biomedical scientist. The topic can be in cancer research, for example, diagnostics or personalised medicine. The presenter will tell us about his/her research and the short term data mining and data integration issues he or she is facing. The lecture is followed by a discussion on possible approaches in solving one or more of these issues. Topics covered will include parsing tabular data, SQL databases, web services and the semantic web. The rest of the day the students will be tasked with finding a solution to a particular problem. Solving such problems can only be done through writing (small) computer programs. This course is suitable for students who take an interest in informatics and biomedical application of informatics. The course builds on the skills acquired in introductionary programming courses; having completed one of these is a hard prerequisite. The introduction to bioinformatics course is not a prerequisite but is highly recommended.

The goal of this course is to outline current data integration challenges in biology and biomedical research and discuss state-of-the-art approaches for tackling these challenges. Students from other disciplines and other universities are invited to attend this course. The topic is suitable for all students in the life sciences dealing with NGS data.

Literature/study material used:
Lectures, Scientific articles, Course laptop (students can bring their own), Online resources and documentation, Online tutorials, Unix operating system, Online discussion and Q&A platform.

Registration:
You can register for this course via Osiris Student. More information about the registration procedure can be found here on the Studyguide.
Max 25 students.

Mandatory for students in own Master’s programme
No.

Optional for students in other GSLS Master’s programme:
Yes.

Prerequisite knowledge:
Basic programming knowledge

Analytics and Algorithms for Omics Data

Period (from-till): 7 March - 18 March 2022

Lecturer(s):
Name, faculty/department, participation (%) in course
Dr. Jeroen de Ridder, UMC University, 100%

Extended course description (for Osiris):
Bioinformatics is at the heart of many modern genomics research, and encompasses the application of statistics and computer science to (large-scale) biomolecular datasets. In essence, bioinformatics is about smart ways of extracting knowledge from the enormous amounts of data that can be generated using modern measurement techniques. For instance, it plays an important role in finding the genetic origins of various diseases, such as cancer, diabetes or alzheimer.

In this course we will study some key examples of bioinformatics analyses, i.e. data analytics and computational algorithms, by reading a set of selected papers that present some significant biological conclusions. Instead of the teachers giving lectures about the methodologies, the students are stimulated to read, study and comprehend the available course material. Some lectures will be provided to ensure the basic concepts are clear.

Schedule: The course runs for five days from 9.00 till approximately 17.00. Each day will include two rounds of paper discussions and two lectures that goes into depth with regards to the computational approaches taken. The second week of the course is for proposal writing and peer review of the proposals.

Content:

  • Unsupervised learning, Hierarchical and k-means clustering, spectral clustering
  • Supervised learning, cross-validation, overtraining, Bayes classifier, Random Forest classifier
  • Dimension reduction, PCA, NMF, tSNE
  • Hidden Markov Models, Forward Backward algorithm, Viterbi

Literature/study material used:
Provided course materials (slides) will be made available through our online learning platform: elearning.ubc.uu.nl

Registration:
You can register for this course via Osiris Student. More information about the registration procedure can be found here on the Studyguide.
Bioinformatics Profile students will have priority when this course is followed as a part of their profile.
Thereafter, registration is on 'first-come-first-serve' basis until the maximum number of 20 participants is reached.

Mandatory for students in own Master’s programme:
No

Optional for students in other GSLS Master’s programme:
Yes, especially CSDB and MCLS students.

Prerequisite knowledge:
Basic knowledge of Linear Algebra and Statistics.

Basic Machine Learning for Bioinformatics

Modern biology is largely a data-driven enterprise. We collect genomic information on thousands of patients and matched controls to find genomic causes for illness using GWAS, easily collect expression of (tens of) thousands of genes at different time points and under different experimental conditions to understand what makes a system tick, and with the rise of single-cell omics the datasets are larger and more specific than ever before. Our minds are formidable pattern recognition devices, but they are biased in various ways and not equipped for these huge datasets. How can we use all this data to build good predictive models, or automatically order data so that we can gain new insights?

Enter Machine Learning. A term that calls forth visions of AI overlords for some, but is so much more pedestrian in most of its applications (yet I, for one, welcome our AI overlords, if they so happen to read this). In this course, we start with the basics: what is the difference between supervised and unsupervised learning (and what we want the computer to do in each case), how do we formulate something that the computer can optimise on its own given training data, and how do we then iteratively optimise this? With these basics of cost functions and gradient descent (or more elaborate optimisation methods) under our belt, we then look at several well-known algorithms and implement them ourselves using only Python, numpy and pandas to gain in-depth understanding in the first week, before moving on to the modern scikit-learn library which does all the heavy lifting for you in the second week. We top it off with a group project on a biological dataset where your team tries to build the best classifier for that dataset. Along the way we look at clustering and dimensionality reduction, and gain cursory knowledge of linear algebra, which is the language that machine learning algorithms are formulated in and which you will use to do so as well.

When you are done with this course, you should be well-equipped to independently learn about more complex classifiers (Random Forests, convolutional neural networks, etc.) or unsupervised methods, and to apply ML to real-world biological problems. This course also lays the foundations for the more theoretical and higher-level understanding you’ll gain in Analytics and Algorithms for Omics Data (BMB508219).

Bioinformatics and Evolutionary Genomics

Currently, molecular biology is generating information on the molecular properties of cells and organisms at an incredible pace. For example, we know the complete genome sequence of an enormous and rapidly increasing number of species. Not only do these high-throughput experiments generate a complete view of the genetic information of cells, other techniques measure the level of expression of all genes at the same time or measure all the interactions between all the proteins present in a cell. Bioinformatics is obviously needed for the storage and primary analysis of these huge volumes of biomolecular data. More interestingly, the data uniquely allows bioinformatics to make biological discoveries that were not possible until now. This course introduces the concepts and approaches required to make evolutionary biological discoveries in this genome-scale data and presents examples of interesting pieces of biology that have been discovered using bioinformatics. Topics to be discussed include genome evolution (as opposed to single genes), and the origins of the eukaryotic cell.

The following subjects are discussed in the course:
1. Sequence homology, Protein domains
2. Gene trees and orthology
3. Genome evolution: evolution of the presence of genes, evolution of gene order
4. Formalizations of gene function (e.g. Gene Ontology)
5. Introduction to high-throughput (HTP) techniques such as micro-array, ChIP-on-chip and yeast-2-hybrid
6. Use of these HTP data to study evolution of function
7. Origin of Eukaryotes, endo-symbiosis, explosion of gene duplicates
8. Genome Evolution: Genome duplications

Cancer Genomics

Period: 10 - 21 January 2022, see www.CSnD.nl/courses

Course Coordinator: Dr. Josephine Daub, Princess Máxima Center for Pediatric Oncology, Utrecht (J.T.Daub@prinsesmaximacentrum.nl)

Faculty
Dr. Jayne Hehir-Kwa, Princess Máxima Center for Pediatric Oncology, Utrecht
Dr. Patrick Kemmeren, Princess Máxima Center for Pediatric Oncology, Utrecht
Lecturers from the Máxima Center, UMCU, UU, Hubrecht Institute, Hartwig Medical Foundation
Description of content

  • Introduction to Next-Generation Sequencing (NGS) and Cancer genomics
  • Detection, visualization, interpretation and annotation of somatic variants in tumors, including single nucleotide variants (SNVs), copy number variations (CNVs) and structural variants (SVs)
  • RNA sequencing based analysis
  • Genome browsers and cancer specific databases
  • Invited lectures covering current advances in Cancer Genomics research, e.g.:
    • Mutational signatures
    • Single cell transcriptomics
    • Tumor classification
    • Survival analysis
    • Genetic interactions
    • Pathway and Network analysis

Literature/study material used:
Handouts of lectures and practical assignments
Research papers
CoCalc computing environment
Online resources and documentation
Course laptop (students should bring their own)

Registration:
You can register for this course via Osiris Student. More information about the registration procedure can be found here on the Studyguide.

Mandatory for students in own Master’s programme
No.

Optional for students in other GSLS Master’s programme:
Yes.

Prerequisite knowledge:

  • Basic-level experience with programming in R and with working in a Unix/Linux environment
  • Basis understanding of bioinformatics and cancer biology

Introduction to Research Data Management for Life Sciences

The course Introduction to Research Data Management gives practical insights on Data Management for scientists. Basic knowledge of relational databases, entity-relationships models, relational models and SQL with MySQL is provided during the course. The programming language used to process data from and to the database is Python.

Proper management of research data is a requirement by funding agencies, publishers or academic institutions. This course provides the technical keys to understand how to model, structure and query data. Benefits of having these skills are numerous: a better insight on how to manage research data and comply with research data management policies, more efficiently store and reuse important data for computational experiments and awareness of the current techniques available to make these tasks easier. The modeling part of the course is focused on communicating the important aspects of datasets to colleagues or an audience via simple models that can be included in posters or other types of publications.

The course is divided into six modules:

  1. Research Data Management and Databases
  2. Data and Models
  3. Starting with MySQL and Workbench
  4. Structuring and Querying Data
  5. Storing and Processing Data with Python
  6. Working with data repositories

Next, more practical insights are given, mainly about:

  • Data modeling with E-R and relational schemas
  • SQL (mainly DML)
  • Working with MySQL and Workbench (modeling)
  • Working with publicly available data by modeling, importing and integrating data into relational databases.
  • Working with data schemas and public repositories

The final grade consists of:

  • Online quizzes (10%), three attempts per quiz. Min. score is 6 per quiz.
  • Two minor assignments (20%), No minimum score. There is one opportunity to resubmit one of the two minor assignments to improve its grade.
  • A final assignment (70%), Min. score is 5. There is one opportunity to resubmit the final assignment is the grade is less than 6.

Min. final grade to pass the course: 5.5

Literature/study material used:
Course content and material is hosted on https://elearning.ubc.uu.nl

A virtual machine (Ubuntu Linux) containing all the necessary software is available for students.
Alternatively, students may choose to install the required software on their own machine. In that case, they will need a computer environment with:

  • Minimum: Python 2.7.9 or Python 3.4.x/3.5.x
  • Jupyter (IPython) notebook
  • MySQL 5.7.X branch
  • MySQL Workbench CE 6.3.X
  • Python pandas (http://pandas.pydata.org/)
  • Windows users can install WinPython (http://winpython.github.io/) containing all the necessary modules by default

Master Level Computational Biology

During the course, the emphasis will be on composing and analysing exact models based on specific hypotheses. The results of the analyses offer an understanding of the original biological system. The models studied address fundamental questions from a variety of biological fields, including:

* Multi-level evolution:
- pre-biotic evolution
- eco-evolutionary dynamics and spatial pattern formation
- genome evolution (e.g. interaction between gene regulation and evolution)* Developmental dynamics:
- pattern formations
- morphogenesis and mechanical interactions between cells
- evolution and morphogenesis* Immune system dynamics:
- self/non-self discrimination
- host-pathogen co-evolution* Behaviour:
- self-structuring through local interactions
- interface between learning and evolution
A number of different model formalisms are used, namely:
* (Non-linear) differential/difference equations (ODE and PDE)
* Cellular automata machines
* Individually oriented models
* Evolutionary computation
After completion the course, the student:

  1. knows how computational models of dynamical systems can be used to investigate biological processes. (e.g. topics mentioned in 3). In particular;
    • the need of computational models
    • how to formulate computational models
    • how to analyze computational models
    • how to interpret results of computational models
  2. knows implicit assumptions of various model formalisms. In particular:
    • ODE and PDE.
    • FSM and CA
    • event based models (e.g. Gillespie)
    • individual (particle) based models
    • evolutionary models
  3. knows basic theory derived from computational modeling of
    • network dynamics (e.g. cell cycle, cell differentiation). In particular:
    • spatial pattern formation (e.g. spiral and chaotic waves)
    • multilevel evolution (genome evolution, eco-evolutionary dynamics)
    • multilevel morphogenesis (from genes, to cells to tissues to organism)
  4. able to understand current literature using modeling. In particular
    • extracting the bottom line
    • evaluating the explicit and implicit assumptions of the models
    • relating the discussion to the theoretical knowledge gained in 3.

Microbial Genomics

Course description
Microbes (bacteria and fungi) are crucial for life on earth and are highly relevant for human life as beneficials, pathogens, food producers, nutrient cycling, agriculture, and many other aspects of our daily lives. The genomes of microbes provide a wealth of information on the processes and mechanisms that these organisms use in their environment. For instance on mechanisms that pathogens use to overcome host immunity or antibiotics, or enzymes that fungi use to produce interesting metabolites that can be used in medicine or agriculture. Furthermore, microbes are commonly used as model systems to research molecules and molecular processes and to uncover how organisms function, develop, and interact with their environment.

In this course you will learn how to analyse genome data of individual microbes, but also of microbial communities (metagenomics). The first part of the course will be focused on basic bioinformatic skills (linux, bash, and command line tools) and the analysis of bacterial genomes. The second part will be focused on the analyses of Eukaryotic microbes with a focus on fungal genomes, comparative approaches, and expression analysis. The course will have theroretical lectures, but will mainly consist of hands-on bioinformatic practicals. Therefore, affinity to work with a computer is required.

Structural Bioinformatics & Modelling

Computational structural biology is a mature field of research whose contribution to life sciences is becoming increasingly more appreciated. The aim of this course is to provide a solid basis of computational structural biology methods, with an emphasis on practical protein modelling and simulation, to interested MSc and PhD students in the life sciences. Further, given the lack of emphasis on practical computational research in MSc and PhD courses, this course is designed to have a smooth learning curve regarding the GNU/Linux environment and its command-line interface. By the end of the course, the students are expected to master the three major computational structural biology methods – homology modelling, molecular dynamics, and protein docking – not only from a user perspective but also from a theoretical standpoint.

The course is scheduled to last three-weeks with in the first two weeks theoretical lectures (including some exercises) in the morning (9:00-12:00) and practical sessions in the afternoon (13:15–17:00). The students are required to summarize the results of the computer practicals by writing a short article in the form of a communication for the Journal of the American Chemical Society. In the second week of the course a guest lecture giving an industry perspective to the topic will be organised. The third week is reserved for the article writing, self study and the final exam. The first afternoon is devoted to the installation of the material and a short crash-course on GNU/Linux and the command-line interface.

The theoretical part consists of classical lectures (see programme above) covering the various aspect of computational modelling of biomolecular systems, together with a few exercises sessions integrated within the lectures. These exercises are meant to illustrate some aspects of the methodology discussed. Through a number of simple python scripts, students will be able to play with some of the techniques discussed, and visualize the impact of various parameters on the simulation results. The material for the lectures is based in parts on the following book (recommended for further in depth reading):

A.E. Leach, Molecular Modelling: Principles and Applications, 2nd edition, Pearson Eduction Ltd, 2001.
PDF of the lecture slides will be provided after each lecture.

The computer practical part [1] is divided in three main modules, each focused on a major computational structural biology method. The philosophy of the practical components of the course follows also our previous experience: the students are given a set of instructions and follow them at their own pace, with the assistants helping out whenever necessary.

The first module comprises the setup and analysis of a molecular dynamics simulation of a small peptide and is based on our previous BSc course and peer-reviewed educational article published in Biochemistry and Molecular Biology Education [2]. The students will make use of GROMACS [3], a widely used software for molecular dynamics simulation, to characterize the conformational landscape of a small peptide and extract representatives that will be used in the third and last module.

The second module covers homology modelling and guides the students throughout all the stages of the process of building a protein model from a structurally characterized homologue. It makes use SWISSMODEL [4] for model building, and Pymol [5] for visualization. The students will use the programs’ command-line interface instead of the readily available web servers. This, we hope, will familiarize them with an important component of computational research, as well as bring them closer to the tools and their many options.

The third module covers the docking of the homology model built in the second module with the peptide conformers extracted from the simulation of the first module. The students will use bioinformatics interface predictors and HADDOCK web servers [6] to predict the interface between the two molecules and build models of their interaction by data-driven docking.

References:

  1. https://www.bonvinlab.org/education/molmod_online/
  2. Rodrigues JPGLM, Melquiond ASJ, Bonvin AMJJ (2015). Molecular Dynamics characterization of the conformational landscape of a small peptide. Biochemistry and Molecular Biology Education. 44, 160-167 (2016).
  3. http://www.gromacs.org
  4. https://swissmodel.expasy.org
  5. http://pymol.org
  6. https://wenmr.science.uu.nl

Please note: In case you have already attended the courses (or similar to) Biological Modelling AND Bioinformatics and Genomics in your Bachelor's programme you can start with your 51 EC major research project and substitute these 7.5 EC by a selection of 7.5 EC from the additional/elective bioinformatics courses.