Skip to main content

Review of "Uncertain research country rankings. Should we continue producing uncertain rankings?"

Published onApr 01, 2024
Review of "Uncertain research country rankings. Should we continue producing uncertain rankings?"
key-enterThis Pub is a Review of
Uncertain research country rankings. Should we continue producing uncertain rankings?
Uncertain research country rankings. Should we continue producing uncertain rankings?

Citation based country rankings consistently categorize Japan as a developing country, even in those from the most reputed institutions. This categorization challenges the credibility of such rankings, considering Japan elevated scientific standing. In most cases, these rankings use percentile indicators and are accurate if country citations fit an ideal model of distribution, but they can be misleading in cases of deviations. The ideal model implies a lognormal citation distribution and a power law citation based double rank: in the global and country lists. This report conducts a systematic examination of deviations from the ideal model and their consequential impact on evaluations. The study evaluates six selected countries across three scientifically relevant topics and utilizes Leiden Ranking assessments of over 300 universities. The findings reveal three types of deviations from the lognormal citation distribution: i deviations in the extreme upper tail; ii inflated lower tails; and iii deflated lower part of the distributions. These deviations stem from structural differences among research systems that are prevalent and have the potential to mislead evaluations across all research levels. Consequently, reliable evaluations must consider these deviations. Otherwise, while some countries and institutions will be correctly evaluated, failure to identify deviations in each specific country or institution will render uncertain evaluations. For reliable assessments, future research evaluations of countries and institutions must identify deviations from the ideal model.

As a signatory of Publish Your Reviews, I have committed to publish my peer reviews alongside the preprint version of an article. For more information, see

I enjoyed reading this paper. The paper builds on a series of earlier papers by the same author, most of them co-authored with Brito. The analyses presented in the paper are clearly explained and seem to have been carried out in a careful, meticulous way. Despite of this, I am not entirely convinced by the main argument made in the paper. More specifically, I have four comments on the paper:

1. In Section 1.2 the author criticizes the use of imprecise terminology and fuzzy concepts in discussions about research assessment. However, the solution offered by the author seems to suffer from the same problems. Isn’t the notion of a breakthrough equally imprecise and fuzzy? And should research assessment be focused exclusively on breakthroughs in the first place, or is this debatable? The author also refers to Nobel Prizes, but is it really clear what Nobel Prizes do and do not tell us?

2. As the author discusses in Section 1.4, citation distributions tend to follow patterns that can be described by lognormal distributions and double rank power laws, although there are also deviations from these patterns. While these patterns are interesting, it is less clear to me why these patterns should be considered ‘ideal models’ and why deviations from these ‘ideal models’ lead to ‘incorrect assessments’. The step from descriptive patterns that characterize citation data to normative patterns that tell us what is correct and what is not seems questionable to me.

3. Statistics based on top 1% publications tend to be based on very small numbers. Doesn’t this lead to a lot of statistical noise, and couldn’t this noise explain some of the results reported in the paper? For instance, could this noise explain why the correlations in panels B and D in Figure 5 are much weaker than the correlations in panels A and C?

4. The author presents a critique on standard bibliometric indicators such as P(top 10%) / P. I think the author should acknowledge other critiques on these ‘size-independent’ indicators, in particular the critique by Abramo and D’Angelo (e.g., I wonder whether these different critiques reinforce each other, or whether they perhaps partly contradict each other.

No comments here
Why not start the discussion?