Skip to main content

Web of Science Raw Data (XML)

User guide for Web of Science raw data

Web of Science data quality

63 million article records

1 billion cited references

117 years

Web of Science Core Collection includes reliable, complete metadata from over 12,500 high-quality journals from around the world, in over 250 science/social science/humanities disciplines.  Conference proceedings and book data are also available.  Data are available back to 1900 and include over 63 million article records and 1 billion cited references to date.

Indexes in the Web of Science Core Collection include:

  • Science Citation Index Expanded (SCI-EXPANDED) --1900-present
  • Social Sciences Citation Index (SSCI) --1900-present
  • Arts & Humanities Citation Index (A&HCI) --1975-present
  • Conference Proceedings Citation Index- Science (CPCI-S) --1990-present
  • Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH) --1990-present
  • Book Citation Index– Science (BKCI-S) --2005-present
  • Book Citation Index– Social Sciences & Humanities (BKCI-SSH) --2005-present
  • Emerging Sources Citation Index (ESCI) --2015-present

Some key data elements:

  • ORCID identifiers are included on over 6.2 million records to support author disambiguation
  • funding acknowledgements, including agency and grant numbers, are indexed
  • full author and institutional affiliation information are indexed to enhance attribution of research and collaboration analysis
  • extensive unification of institution names to aggregate complex naming variations and sub-organizations

DAIS - Distinct Author Identification System

This system disambiguates authors in the Web of Science Core collection. It assigns author ids to the authorships of papers.

There are four major components to DAIS

  • Initial Clustering – Starting from scratch, take our whole database without an authority list of known authors, identify the different authors.
  • Ongoing – As new data comes into the database, assign author ids.
  • RID Integration – Integrates manually created publication lists with DAIS
  • Reevaluation – Does a fresh, full clustering on a per name basis; discovers new authors not known at the time of the initial clustering

Further reading