Universiteit Leiden

nl en

LUCL to start working with Macroscope: ‘One place filled with datasets and tools’

Over the coming years, LUCL will be collaborating on the development of Macroscope, a new scientific infrastructure that maps social change at the population level. Professors Gijsbert Rutten, Stephan Raaijmakers and Carole Tiberius tell us more about the project.

By securely linking and analysing large datasets, Macroscope will enable complex social and cultural processes to be monitored and researched on a large scale. An important part of Macroscope is a broad-based Netherlands Media Corpus and its accessibility through a user-friendly annotation and analysis infrastructure (the Text Suite). By also using AI techniques, new research opportunities will be opened up. In this way, Macroscope will help to improve the scientific infrastructure of the Social Sciences and Humanities. ‘We already had CLARIAH for the humanities,’ says Rutten. ‘The social sciences had ODISSEI. Partly at the request of NWO, we have already started collaborating in the SSHOC-NL (Social Sciences and Humanities Open Cloud project. Macroscope is now building on that.’

Major collaboration

Fourteen universities and a number of institutes are working together to make as many datasets as possible accessible from a single location and to link them together. ‘Our ambition is to interpret the dynamics in society,’ says Rutten. ‘At LUCL, for example, we have been working on misinformation and disinformation for a long time. We can monitor these better if we can link different datasets.’

The aim of Macroscope is twofold. On the one hand, existing information has to become more readily available and, on the other hand, new tools are being developed to examine this data. ‘We are going to work with generative AI and Large Language Models,’ says Raaijmakers. ‘Of course, we have had AI tools for a long time, but with these Large Language Models, you can also converse interactively about analyses. This makes research much more dialogue-oriented, which is a very interesting development. We want to investigate how best to use this tooling to support scientific work. Will it lead to new hypotheses? Can you conduct interactive research? And what do you do with the dark side of AI? We need to build a new value system in which we deal sensibly with issues such as authorship and authenticity.’

Linguistic analysis

Macroscope will ultimately cover the entire SSH domain, but LUCL will focus primarily on linguistic analysis. Tiberius, who works at both the university and the Institute for the Dutch Language (INT), which is also involved in the project, explains: ‘Our goal is to regulate access to data (including sensitive data) that can then be subjected to all kinds of analysis tools, such as topic modelling and sentiment analysis. That is why the INT is also involved in the data harvesting branch of the project. In concrete terms, for example, our goal is to set up a workflow for the Netherlands Media Corpus in close collaboration with the Royal Library. All kinds of data, from text to speech, video and old websites, will then be automatically processed and made accessible via a single digital research environment, the Text Suite. Some of that data has already been digitised, while another part of the corpus still needs to be digitised.'

Rutten: ‘It’s a very attractive idea to have a single place in a few years' time where linguists, students and other interested parties can go to apply various tools to all kinds of datasets and then compare and evaluate the output of those tools.’

This website uses cookies.  More information.