"As a researcher, this tool saves me a lot of work and stress”

A simple way of harvesting and analysing social media data with 4CAT

In this series of interviews, we show what contribution projects can make to FAIR research IT. The research teams of the projects have received a grant from the FAIR Research IT Innovation Fund.

Good news for scientists whose technological skills are limited, but who want to work with social media data in their research: now you can use 4CAT, a tool that helps you to harvest and analyse such research data in a simple way. To make things even more easy for researchers, Utrecht University now offers 4CAT for all staff and students. Not only is this useful and efficient, it also contributes to data that are more FAIR.

Cultural studies scholar Jeroen Bakker has no doubts at all: since he has worked with 4CAT, harvesting data has become a lot easier. At the Utrecht University Data School, Jeroen Bakker does research into online public debates, and their influence on Dutch democracy. “My research frequently involves working with social media data,” he says. “For instance, I analyse millions of posts on Twitter, Reddit and Telegram. And I zoom in on specific messages and user groups and then give them a qualitative interpretation.”

What does the Data School do?

The Data School is part of the Centre for Digital Humanities, a Utrecht University institute that focusses on the digitisation of Humanities. Researchers at the Data School look at how big data and artificial intelligence influence citizenship and democracy. They particularly work on practice-oriented projects commissioned by local and regional authorities.

Getting all these posts from the internet to his computer, Jeroen Bakker previously used all kinds of different software tools. Social media companies give access to user data with a so-called Application Programming Interface (API), he explains. “That is a way in which you can `talk’ with the platform as an outsider. For instance you could say: give me all posts from March 2023 containing the word ‘democracy’. Next you receive these posts as a raw data file."

The thing is: collecting data in such a way is rather complicated and very time-consuming. For instance,  Facebook uses other methods than Twitter or Snapchat. And besides, regulations may change from time to time. That is why Jeroen Bakker was pleasantly surprised when he was introduced to 4CAT, an online data collecting tool developed by researchers at the University of Amsterdam (UVA, see text box). “A whole world opened up to me.”

For those researchers who do not work with programming languages, 4CAT is a godsend.

User-friendly software

What kind of tool is 4CAT exactly? “The abbreviation stands for Capture and Analysis Toolkit,” explains Jeroen Bakker’s colleague Sander Prins. As a project leader he took the tool to Utrecht University. “Actually, the abbreviation says exactly what you can do with 4CAT: collect social media data (capture) and analyse it. And not only coming from one channel, but from several platforms simultaneously. Suppose you are looking for videos on TikTok and Instagram with a particular hashtag. 4CAT only needs a few mouse clicks to retrieve this information.”

In addition, the tool contains all kinds of useful options to analyse the imported data right away. Sander Prins: “For instance, you can have a look at how posts on a particular topic are spread over time. That information can easily be converted into a handy table or histogram using 4CAT But the tool can also produce more elaborate “network visualisations” that show you at a single glance how several social media accounts relate to each other.”

Jeroen Bakker and Sander Prins at the Drift, Utrecht (photographer: Laura Hompus)
Jeroen Bakker and Sander Prins at the Drift, Utrecht (photographer: Laura Hompus)

This means that researchers no longer have to work with APIs or other technical methods, but  can nevertheless import and process large quantities of data easily and quickly. “Scientists are almost always book smart, but not everybody is programming smart,” says Sander Prins. “For those researchers who do not work with programming languages, 4CAT is a godsend.”

The program has a user-friendly interface, allowing you to just click and fill in text boxes. So you do not have to know anything about programming. Sander Prins: “So if you have never worked  with programming languages before, this is the tool you were looking for.”

Collaboration with University of Amsterdam

4CAT is developed by the Digital Methods Initiative (DMI), part of the University of Amsterdam. The Data School of Utrecht University closely collaborates with this team. “We are not the creators of 4CAT, but contribute to its development by providing feedback and additions,” Sander Prins explains. “Where DMI focuses more on the technical aspects and developments of 4CAT, we put the emphasis on data ethics: how to deal carefully with the user data you are collecting? In this way we really complement each other.”

For all staff and students

Anyone who wants to can download the code of 4CAT free of charge and install it on their laptops. However, this still requires some technical knowledge. To make it easier for users, 4CAT has recently been offered to researchers and students at Utrecht University. Sander Prins: The tool is now running on a server of the university. Researchers who would like to use 4CAT, can ask access via https://4cat.cdh.uu.nl/request-access/. Students and lecturers  who want to use the  tool for educational purposes get access via this link. Or send an email to s.prins@uu.nl.

Thinking in data is like a particular ‘pair of glasses’ you have to put on.

Jeroen Bakker, researcher working with 4CAT (photographer: Laura Hompus)
says Jeroen Bakker, researcher

Also the collected research data, such as posts, tweets, photos and videos, are carefully stored on the UU server. “That is such a good thing, now I don’t need to keep my laptop running for days on end when using 4CAT,” says Jeroen Bakker enthusiastically. “Besides, it saves a lot of storage room, and there is no more hassle with back-ups and external hard drives. But most of all, it saves a lot of stress. Because I am sure that the system remains stable and that I won’t be losing data.”

Grant from the FAIR Research IT Innovation Fund

To have the installation of 4CAT on the UU server managed technically, the Centre for Digital Humanities received a grant from the FAIR Research IT Innovation Fund (see text box). The tool helps research data being made more FAIR, says Sander Prins. “4CAT makes collecting data from online platforms more accessible. Moreover, it is done according to 4CAT's design principles: transparent, modular and traceable You can precisely trace back the data you harvested and how you did it. This improves the reproducibility of your research.”

Jeroen Bakker adds: “4CAT also offers the option to share your data analyses with others, simply by creating a link. Via that link others can have a look at your data files or use them if they want to.” In short: if you work with 4CAT your research data will become better Findable, Accessible, Interoperable and Reusable.

“Of course, we observe the regulations concerning data protection,’ emphasises Sander Prins. “Because, although 4CAT makes social media data more easily accessible for research, this does not mean that the data can just be shared publicly. Using 'Tactful non-contact research', a document specially compiled for this type of research, we look at what steps are needed each time. When researchers work with special personal data, it will be anonymised or pseudonymised.”

About the FAIR Research IT Innovation Fund

Utrecht University wants each research team to be well supported in the field of research IT. One of the ways to achieve this is through the FAIR Research IT Innovation Fund. Scientists can receive a grant for projects which, for instance, improve the IT infrastructure of scientific research. You may think of projects that enable enough storage capacity for data, or of the development of tools and services that help researchers in their work. FAIR and open science principles are the guidelines when selecting projects. Other researchers must be able to easily and quickly reuse the knowledge and solutions.

Contributing to better research methods

Certainly in their own discipline, the humanities, there are steps to be made in the field of digital research methods, according to Jeroen Bakker and Sander Prins. `The past few years, we have seen an increase in digital analysis methods, but we have also seen some trepidation to use them. By making tools such as 4CAT more widely available, we want to encourage better research methods."

Sander Prins and Jeroen Bakker regularly organise introductory workshops for scientists who want to know more about what 4CAT can do for their research. Jeroen Bakker: “In these workshops we show you the ropes. Thinking in data is like a particular pair of glasses you  have to put on. As soon as you grasp that, you will see a world of possibilities. And that may result in adding value to your research.”