2. Scientometrics

The application of knowledge —as manifested in entrepreneurship and innovation, research and development, and software and product design—is one of the key sources of growth in the global economy.

Scientific publications provide a source of information to measure and model the knowledge base of societies, firms, regions and organisations. Scientific communications are extremely well archived, and therefore, we have a wealth of data at our disposal when we study the dynamics of the sciences and science-based technologies.

Scientometrics is the science of measuring and analysing science. Modern scientometrics (aka Bibliometrics) is mostly based on the work of Derek J. de Solla Price and Eugene Garfield. The latter founded the Institute for Scientific Information (ISI) which is still heavily used for scientometric analysis. This class focuses on the strengths and weaknesses of scientometrics data. Furthermore, several software tools will be introduced to organise scientometrics data in a relational database. In this course we will use SAINT (developed by Rathenau Institute) and ISI (developed by Loet Leydesdorff). Also, CorText Manager provides an online platform interfacing users with a range of analysis tools developed by CorText corpus. This application enables to upload data sets from disparate sources, and initiate treatments (scripts) to perform remote analyses/maps of primary data. A private space allows users to launch and test on their own chains of treatments before possibly making results public.

Relational database

In this section, we examine ways to analyze the data we have collected in terms of systems of relations. Why do we need relations? Authors may be related to different titles and titles to different authors. Thus, networks of relations can be spanned. A common measure of such relations is the extent to which papers cite the same previous papers. This is called bibliographic coupling. Similarly, co-citation is the configuration that a paper is cited by—rather than citing from—other papers.) Later on in this course, we will look at social network analyses in more detail. Other types of relations are co-words (Courtial – Coword analysis of scientometrics) and word-reference co-occurrences (Van den Besselaar and Heimeriks 2006). A good overview of different types of scientometric analyses is provided by Boyack – Co-Citation Analysis, Bibliographic Coupling, and Direct Citation.

In order to perform analyses, we need to organize our data in a relational database. A relational database matches data by using common characteristics found within the data set. The resulting groups of data are organized and are much easier for many people to understand. For example, a data set containing all publications in a field can be grouped by the year each publication was published, the country of origin, the topics, the cited references, journal names, author’s last name and so on. Such a grouping uses the relational model (a technical term for this is schema). Hence, such a database is called a “relational database.”

There are several good tutorials for MS Access 2010. In this course we will focus mostly on constructing queries in MS Access. A “query” refers to the action of instructing the database to return some (or all) of the data in your database. In other words, you are “querying” the database for some data that matches a given criteria.

At the bottom right side of the screen, you will have noted the option “Output Records”. Here you can save records – at a maximum of 500 at a time – for further processing. The ISI freeware programs are available at http://www.leydesdorff.net/software/isi/index.htm which allows the user to organize these files into “relational database management.”

Alternatively, the Rathenau Institute has developed SAINT , which stands for Science Assessment Integrated Network Toolkit. This is a set of tools for bibliometric and patentometric research, including a parser program for the ISI/Web of Science downloaded bibliographic data. A first alfa-version of SAINT is now available.

A detailed SAINT Manual will guide you step by step through the use of the scientometric tools developed by the Rathenau Instiute and other parties. These tools allow you to make a visualization and network analysis of data from Thomson Reuters’ Web of Science.

SAINT can be used for;

1.Turning raw data into a relational database
2.Cutting titles and abstracts into separate words for analysis
3.Building queries to answer your questions
from simple statistics
to complex patterns

Run SAINT installation, select ISI data importer
Select your input .txt files
Select output database (use MS Access initially; MySQL databases for file sizes over 2Gb)
Examine database in Access



The ISI data importer tool displays three tab pages. On the first page, you can select the input file or files that you want to import. Type or copy/paste the name of the file into the box, or click on the file selector button () to display a file dialog. From the file dialog, you can easily select multiple files (located in a single directory) at once. The last ten selected files will be stored, so you can easily access them again using the drop-down list. Click on the little arrow inside the box for the file name to display the previously used files.

Once you have selected the files that contain the raw ISI data, change to the Output tab. Here, another file selection box is presented. Use this Output file box to select which file you want to use to output your data to. Note that this file does not have to exist yet. If you enter or select a file that does not exist yet, that file will be created.


Queries are used to interact with the data in a database

  • search and select (e.g. all articles where country is Netherlands)
  • combine and compare (merge two tables; calculate similarity)
  • statistical analysis (count, sum, average, etc.)
  • Please discuss briefly some Standard queries (What information do they provide?);
  • Count of Articles per year and per country
  • Bibliographic coupling
  • Co-authorship
  • A query that couples articles to authors only
  • A query that couples articles to research addresses only
  • A query that couples authors to keywords

When you build queries, always link tables using unique identifiers

Use numbers as identifiers:
matching on numbers is MUCH faster than matching on strings
use the type long integer when you make your own

SAINT produces unique identifiers for all items
When you make your own data (e.g. harmonised author names) make sure to add a unique identifier

Scientific publications can be harvested from the Web-of-Science of the Instituteof Scientific Information(Thomson) at http://www.isiknowledge.com/ . You can access this address directly or through the digital library of Utrecht University.

Go to “advanced search”. Let’s search for the Nobel prize winnining author  Andrei Geim. If you click on the search results, you can inspect them one-by-one. They are organized with the most recent papers on top of the list. Scroll down and find one with citations. Click on it and study the layout of the record. As you see, you can click on the cited and citing references. What is the difference between these two?

Go back to the listing. On the right side is a screen that enables you to “Analyze Results” and to make a “Citation Report”. The citation report, for example, informs you about the development over time. The picture raises questions. Can you formulate one? The tab “Analyze Results” allows you to generate distributions. Make a distribution of the authors in this set.

In a next step we will now download the records in order to proceed with more options for the scientometric analysis. To that end, enter the total number of records (1 to 100+) in the third option under “Output Records”. Then click on “Add to Marked List”.

Enter the “Marked List” at the top of the screen and save the records to file after tagging all the fields that may be of interest to us in a later state. (Take them all.) The computer now saves the full records (as plain text!) and thereafter you can save them in a folder as “data.txt”.

The text file can be parsed with the ISI-parser. An access database will be created, with all ISI information organised in a relational database. Several standard queries are automatically generated. I used this text file (right click).


In order to get these results you have to construct queries. A query can be created in ACCESS by pressing the  create tab. Choose the option of QUERY design.

You are asked which tables contain the information that you would like to have combined in your QUERY. Highlight the tables that have the relevant data then click add. The tables will appear in a grey window and then click and drag the relevant sections from each table that you want to appear.

The journals are listed in the ARTICLES table. The names are listed in the JOURNALS table. You have to create a relationship between the two tables by clicking on Journals-ID in ARTICLES and drag to ID in Journals.

First you select the field ‘Journal-ID’ from the ARTICLES Table, after that you select ID. In order to make a count, one has to press the Sigma (totals) button on the top menu bar. Select ‘group by’ COUNT in the ID column.

A similar QUERY can be constructed using the tables COUPLE-ARTICLES-CITEDREFERENCES (field CITED REFERENCES) and CITEDREFERENCES (count of ID).

You need three tables in your QUERY to get the most frequently cited journals;



Boyack, Kevin W., and Richard Klavans. “Co?citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?.” Journal of the American Society for Information Science and Technology 61.12 (2010): 2389-2404.

Courtial, Jean Pierre. “A coword analysis of scientometrics.” Scientometrics31.3 (1994): 251-260.

Etzkowitz, H., & Leydesdorff, L. (2000). The Dynamics of Innovation: From National Systems and “Mode 2” to a Triple Helix of University-Industry-Government Relations. Introduction to the Special “Triple Helix” Issue of Research Policy, 29(2), 109–123.

Science in Transition POSITION PAPER – October 17, 2013
Why Science Does Not Work as It Should And What To Do about It

Van Den Besselaar, Peter, and Gaston Heimeriks. “Mapping research topics using word-reference co-occurrences: A method and an exploratory case study.” Scientometrics 68.3 (2006): 377-393.