OpenAlex, a big step towards Open Science?
Interview with Jeroen Bosman
Already a year ago, the new database OpenAlex offered data on more than 250 million works. Far more than competitors such as Web of Science, Scopus or Google Scholar, And to top that, completely free. What options has OpenAlex to offer? Exactly how open are the publications? And will this database replace expensive subscriptions in the near future? Open Science specialist Jeroen Bosman explains.
OpenAlex is a worldwide database with scientific output. The name refers to the Library of Alexandria that aimed at a universal, scientific collection. In addition, the name refers to full openness and availability. That is why OpenAlex fits in well with the principle of Open Science: the striving for an academic world in which knowledge and research is freely accesible and reusable for everybody.
OpenAlex offers more than other databases, how is that possible?
It is because OpenAlex is much more inclusive. It puts fewer limitations on what it includes in the database, for instance in terms of languages and formats of publications. This benefits researchers writing in other languages of for certain disciplines. Humanities for instance, mainly publishes in books and not in journals. OpenAlex provides metadata on these books or book chapters. By comparison: Scopus and Web of Science only include journal articles in their databases having an English-language abstracts, even if the article is written in another language. As a result, these databases do not offer publications in Spanish or Mandarin, even though lots of research is done in these languages. Moreover, Scopus and Web of Science mainly include data on journal articles, and specifically from journals that are widely discussed and cited. OpenAlex has made the choice not to do so, because this can lead to a narrow view of what science brings. For example, in OpenAlex you also find information about preprints, early versions of articles, which are becoming increasingly important in many disciplines.
How does OpenAlex get all that data?
Its main sources are databases of organizations that register research, such as CrossRef. This organization gives publications a unique number: a DOI. The data of all publications with a DOI are known and public. OpenAlex gets the metadata from there. In addition, OpenAlex took over the database of Microsoft Academic. This database contains data on publications resulting from conferences. OpenAlex is completely transparent about its sources, you can check everything on the website.
How open is OpenAlex really?
Use of the database is free and you can do whatever you want with the data. For the larger part metadata consist of facts that are not copyright protected. Think of the title, the author, the subject and the keywords given by the authors themselves. The metadata also includes the abstracts and the references at the end of the article. But those abstracts and citations are not yet made openly available for inclusion in databases by some major publishers. And paid databases impose restrictions on use and sharing.
OpenAlex offers, if it can, also the abstracts and citations, making it a good database to do research about research. For instance, if you want to find out how often a subject is researched, in what countries and in what languages. You can answer the question; how much is published in French in the Netherlands? How many works of an author are open access? How often is an author cited? Anyone can download all that data for analysis. The difference with search engines such as Google Scholar is that you can do the analysis in a very systematic and reproducible way.
Via a so-called CC0 indication users know that OpenAlex does not claim any rights to the data and so gives permission to export and share the data. Where available, OpenAlex also provides links to open access versions of publications, either with the publishers or in academic repositories.
OpenAlex offers equal opportunities to all scientific output, regardless of language or format.
Does a team also check whether everything that comes in is reliable?
As for paid databases, data reliability of the data is a focus of OpenAlex. The emphasis is on metadata, not on the content of the publications. That check should be done by the publishers. That is why OpenAlex trusts CrossRef to do a proper check on the organizations requesting DOIs. But it may happen that data on publications need to be corrected afterwards. That kind of corrections is also done at other databases, but more behind the scenes. OpenAlex is very transparent in this respect. For instance, in OpenAlex you can filter on publications that have been withdrawn by publishers. This transparency is important for the so-called self-correcting capacity of science.
So: an ideal database?
The ideal database is still in the future. Although OpenAlex aims at a balanced presentation of its metadata, there is still an imbalance in it. Web of Science and Scopus try very hard to make the metadata as complete as possible by assigning keywords themselves. Because of this and their stricter selection their databases give a more uniform impression than OpenAlex. At OpenAlex, the emphasis is more on combining all kinds of openly available metadata in a smart and user-friendly way.
Although we are lucky in Utrecht to be able to afford two expensive databases, OpenAlex better fits in with us as university.
OpenAlex is free now, but how long can it stay that way?
You might as well ask yourself how such a small company can offer such a large high-quality and competitive database. OpenAlex is completely transparent about the answer: through donations from charitable funders such as Arcadia. But, for basic services it wants to keep its independence. That is why it starts from open data, so its ‘raw materials’ are free. Institutions can support OpenAlex via an institutional subscription. This involves getting better support and a faster option for automated consultation of the database. OpenAlex claims to have a financial sustainable model.
The chance OpenAlex will be taken over by a commercial publisher is small. Because the database attaches a CC0 to its license, anybody may share and copy the data. As a result, the data is probably already stored in more places. So even if you buy the whole company, the data remain public.
Will OpenAlex replace paid databases in the future?
That remains to be seen. I must say that in libraries the paid databases are viewed more and more critically. The question is now: what are the essential use cases for which you really need Scopus or Web of Science? Does OpenAlex meet the demands for search actions such as systematic reviews? With the arrival of OpenAlex and other open databases such as OpenAire and The Lens, I think the list of use cases for which you really need Scopus or Web of Science is getting shorter.
Although we are lucky in Utrecht to afford two expensive databases, OpenAlex fits in better with our views we hold as a university. After all, OpenAlex facilitates Open Science. This movement wants to make knowledge and research immediately free of charge and accessible to everybody. Utrecht University supports this movement and signed the Barcelona Declaration to promote the openness of research information. Transparency is of major importance, especially if we evaluate research and researchers based on it. As part of the Barcelona Declaration, an international working group will be looking at the question if, when and how closed databases can be replaced.
Can we speak of idealism in the case of OpenAlex?
Oh yes, the people who work there have a clear conviction about the importance of their work. Besides, OpenAlex offers equal opportunities to all scientific output, regardless of language or format. It has committed to certain principles of open data infrastructure. To that purpose it has built-in safeguards. What is open now, will remain open in the future.