Project Page: StudyPaths


Currently, the educational program BSc Earth Sciences has 6 tracks designed to include courses that fit a certain topic. The current situation has some disadvantages; for example, students have many options to choose from in terms of combinations of tracks and courses, and the tracks do not necessarily align with the requirements for graduation. Therefore, the program directors would like to update the program and the tracks within it. Ideally, the project leaders would like to talk to all students about their choice of tracks and courses, but that is practically impossible. Therefore, analyses of Osiris data served the purpose of discovering which tracks students currently choose. The results of the analyses will be input for deciding on the new tracks. Other inputs are the existing tracks and the suggested tracks by the staff.

Potential benefits of setting new tracks:

  • For students: more clear what a track entails and that a track will lead to graduation.
  • For tutors: easier to advise students on which track to choose.
  • For program directors: have a new curriculum that can last for the coming 20 years, set up to keep in mind the development of competencies that play a role in the program.

To find out about students’ study paths, we will make use of data from the student information system. Students leave digital traces of their learning activities, which contain valuable information on understanding the learning process. Process mining is a technique to analyze these digital traces. The so-called event log (events collection, each event refers to an activity performed in a process) is the basis for process mining.

Questions to guide the analyses:

  • How many unique tracks are followed by students?
  • Which combination of courses are followed by students?


Input data

In this particular project, two analyses were performed: i) discovery of study paths and ii) frequency analysis of course pairs. Aligned with the focus of the project, study data of bachelor students who followed Earth Sciences is used as input. Specifically, study data is extracted from the student information system of the university based on the following criteria: courses followed by the students who started Earth Sciences in 2014-2017 and finished their bachelor’s thesis.  

Subsequent years are not included to ensure that there was no effect of the Covid-19 pandemic on the results. Important to note that pseudonymization is applied to the input data at the extraction stage to free it from personal information, as required by privacy regulations (e.g., GDPR). The extracted input data contains over 8000 data points about more than 80 courses followed by around 406 students.

Data cleaning

As the educational program has evolved in the past years, the courses in it have changed accordingly. For example, new courses are added, or existing courses are modified or removed. Such changes have inevitably resulted in varying names and codes for the same course. To avoid inconsistency, course code and name standardization is performed on the input data. Aside from that, some other courses are filtered out as they are marked for exclusion by the program managers. The cleaned data consists of 6025 data points for 47 courses followed by 406 students. Following that, an event log is created from the cleaned input data. While creating the event log, the following is considered to determine what groups of courses are taken together: courses starting at the same time in an academic year should be composed as a single activity.


For the discovery of study paths, a well-known Process mining discovery algorithm (Fuzzy Miner) is applied to the event log, and a process map is generated.

Figure 1 shows an example study path.

As can be seen in Figure 1, there are many study paths; each consists of several nodes, i.e., groups of courses taken together. In the figure, a study path is highlighted in blue, and in it, how nodes are connected can be observed. As expected, in the early year of studies, the variety of study paths is somewhat limited compared to the end of the programs. Simply put, more branching appears over time, which is the natural result of the choices made from the plenty of elective courses.


For the frequency analysis of course pairs, courses are grouped in two, and then the number of students following each group is calculated. Further, these numbers are transformed into percentages so students’ involvement in courses can be better interpreted that way.

Figure 2 shows the heat map of course pairs and the number of students that followed a course pair.

Study paths:

The most prominent observation made is the variety of study paths. The percentage of the total number of unique paths to the number of students is 98%. This means that almost every student has followed a unique study path. Only 7 study paths were followed by more than 1 student with a maximum of 3 students. Moreover, the shortest study path contains 7 nodes, whereas the longest was with 15 nodes. Important to note is that some students have followed courses after completing their thesis courses. The earliest point that the thesis course appeared in a study path was on the fourth node.

Course pairs:

In the heat map of course pairs, two of the existing tracks were easily recognizable. A new insight was that students of one of those tracks relatively often choose a specific course that is outside their tracks. The explanation is that in that period, this is the only course offered somewhat close to the track. So, the schedule also partly determines which courses students choose. Besides this insight, there were no surprising results.


To summarize, two types of analyses were carried out using Process mining. The analysis of study paths showed there is a multitude of paths that students choose. This might indicate that students have a broad interest in topics, which could be a reason to offer more types of tracks within the program. The analysis of course pairs did not lead to any surprising findings. It confirmed that how courses are also scheduled partly determines which courses students will enroll in.

The next step in the project is to use the input from these analyses, together with the input from staff and directors and their experiences with the current tracks, to decide on a potential plan for altering or enhancing existing tracks and for establishing new tracks

Contact person and email address

Dr. Elisabeth Addink, Faculty of Geosciences,