Published onSep 10, 2023
OpenCitations Meta is a new database that contains bibliographic metadata of scholarly publications involved in citations indexed by the OpenCitations infrastructure. It adheres to Open Science principles and provides data under a CC0 license for maximum reuse. The data can be accessed through a SPARQL endpoint, REST APIs, and dumps. OpenCitations Meta serves three important purposes. Firstly, it enables disambiguation of citations between publications described using different identifiers from various sources. For example, it can link publications identified by DOIs in Crossref and PMIDs in PubMed. Secondly, it assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs), to bibliographic resources without existing external persistent identifiers like DOIs. Lastly, by hosting the bibliographic metadata internally, OpenCitations Meta improves the speed of metadata retrieval for citing and cited documents. The database is populated through automated data curation, including deduplication, error correction, and metadata enrichment. The data is stored in RDF format following the OpenCitations Data Model, and changes and provenance information are tracked. OpenCitations Meta and its production. OpenCitations Meta currently incorporates data from Crossref, DataCite, and the NIH Open Citation Collection. In terms of semantic publishing datasets, it is currently the first in data volume.

As a signatory of Publish Your Reviews, I have committed to publish my peer reviews alongside the preprint version of an article. For more information, see

This paper introduces OpenCitations Meta, an important new component in the open infrastructure that is being created by the team of OpenCitations. I see OpenCitations Meta as a crucial milestone in the development of this infrastructure. The paper is generally well written and offers a lot of useful information about OpenCitations Meta. I have a few relatively minor suggestions for improvements.

The paper uses quite some Semantic Web terminology. Some parts of the paper may be difficult to understand for readers who are not familiar with this terminology. In the introduction of the paper, my suggestion would be to refer these readers to sources they can use to familiarize themselves with Semantic Web ideas and the related terminology.

I believe the most important weakness of the paper is the way in which the authors present the benefits of OpenCitations Meta. In the introduction and conclusion sections, and also in the abstract, the authors discuss three benefits of OpenCitations Meta. These are benefits relative to the infrastructure that was provided by OpenCitations before the release of OpenCitations Meta. While it is understandable that this is a natural perspective for the authors, I don’t think this perspective is particularly appealing to the typical reader. I expect readers to be interested in the benefits of OpenCitations Meta relative to other open and closed infrastructures. For most readers this will be much more important than the benefits of OpenCitations Meta relative to the OpenCitations infrastructure before the release of OpenCitations Meta. My suggestion to the authors therefore is to rewrite the introduction and conclusion sections of the paper in a more user-centric style (rather than the current style that is more developer-centric).

Two small issues:

1. I don’t fully understand the idea of supplier prefixes in OMID identifiers, as discussed in Section 3.1. The authors state that “060 corresponds to the supplier prefix, which indicates the database to which the bibliographic resource belongs (in this case, OpenCitations Meta)”. Does this mean that all entities in OpenCitations Meta have the supplier prefix 060? If the prefix is always the same, why then do we need the prefix? It seems redundant. Also, it is not clear to me why the prefix starts with ‘06’. Why not ‘01’, for instance? I also don’t understand why the prefix must have a ‘0’ at the end. Some further explanation would be helpful.

2. The numbers reported in the first paragraph of Section 4 are important. My suggestion therefore is not to report these numbers in the text, but to present them in a table.

Full disclosure: I am chair of the advisory board of OpenCitations.

