Skip to Main Content

InCites Benchmarking & Analytics: Baselines

InCites Benchmarking & Analytics is a research analytics tool.

What is a Baseline?

A baseline is the average performance of a global set of publications with the same subject area, document type and year. For example, a global set might consist of all articles in the field of chemistry published in 2006. Baselines and subject schemas create
useful reference points for comparison and they are the basis of normalization to overcome subject bias.  Baselines are calculated using a whole counting method, this means that all papers in a subject area are counted towards the baseline calculation regardless of whether those papers are also in other subject areas or not.

Baseline Calculation Example
Article ID Times Cited Subject Categories Document Type Year
A 0 Chemistry, Organic Article 2010
B 12 Chemistry, Organic & Chemistry, Physical Article 2010
C 5 Chemistry, Physical Article 2010
D 8 Chemistry, Organic Review 2010


This table shows some sample publications A-D that are in different subjects, and have different document types. For simplicity of the
demonstration of the calculation all papers are in the same year, but in reality, baselines are also calculated for each year. The citation impact (average citations per paper) baseline for each variant of subject, year and document type will be calculated as the mean average:

Baseline formula

Where: e = the expected citation rate or baseline, c = Times Cited, p = the number of papers, f = the field or subject area, t = year, and d = document type.


For Articles in the field Chemistry, Organic published in 2010 (A&B) it would be:

For articles in Chemistry, Physical published in 2010 (B&C) it would be:

For reviews in Chemistry, Organic published in 2010 (D) it would be:

Note: The citation distribution for any set of publications is typically skewed towards a small number of highly cited papers and a large number of papers with relatively few citations. Because baselines are based on the mean set of papers and the mean is influenced by the presence of highly cited papers, the mean average will be considerably higher than the median. Therefore more than half the publications are below the mean average.

The following chart shows the differences between Citation Impact of various subject categories. Mathematics has a lower Citation
Impact than biochemistry & molecular biology. Recent publications exhibit lower citation impact due to the fact that older papers have
had more time to accrue citations, and therefore exhibit a higher average citation count. Citation Impact can vary significantly across
different disciplines and time periods so it cannot be used effectively to compare entities that are in different subjects or years. In these cases, it is preferable to use some form of normalization to allow for the differences in fields and time (see Category Normalized Citation Impact, % Documents in Top 1% and % Documents in Top 10%, Average Percentile).