MMI Lesson 5: The Geography of Science and Innovation

1 Spiky World

Richard Florida shows in THE ATLANTIC MONTHLY that by almost any measure the international economic landscape is not evenly distributed at all. ‘Our world is amazingly spiky. In terms of both sheer economic activity and cutting-edge innovation, surprisingly few regions truly matter in today’s global economy’ (Florida, 2005:p.48). This uneven distribution becomes clear when measuring the clustering of people, economic activity and patents.  Population and economic activity are both uneven distributed, but it’s innovation – the engine of economic growth - that is most concentrated (Florida, 2005:p.49). According to the patent distribution only a few places in the world produce most of the innovations (see figure 1).

Figure 1: The world’s total patent distribution. The higher the spikes the more patents are produced in that particular city (Florida, 2005:p.50).

Moreover concentrations of creative and talented people are particularly important for innovation. Ideas flow more freely, are honed more sharply and can be put into practice more quickly when large numbers of innovators, implementers and financial backers are in constant contact with one another (Cooke, 2004). Figure 2 shows that concentration of the world’s most prolific and influential scientific researchers is even more concentrated than patents. Apparently, these researchers overwhelmingly reside in the U.S and European cities.

Figure 2. The world’s distribution of the most prolific and influential scientific researchers (Florida, 2005:p.51).

Further reading on the Growth, innovation, scaling, and the pace of life in citiesis provided by Luís M. A. Bettencourt et al. They present empirical evidence indicating that the processes relating urbanization to economic development and knowledge creation are very general, being shared by all cities belonging to the same urban system and sustained across different nations and times. Many diverse properties of cities from patent production and personal income to electrical cable length are shown to be power law functions of population size with scaling exponents, ?, that fall into distinct universality classes. Quantities reflecting wealth creation and innovation have ? ?1.2 >1 (increasing returns), whereas those accounting for infrastructure display ? ?0.8 <1 (economies of scale).

Figure 1 and 2 demonstrate that innovations tend to take place at specific areas in the world. This implies that the innovative ecosystem of an area determines its innovation rate. However, the situation may be changing.

2. Globalisation of Science

TWENTY years ago North America, Europe and Japan produced almost all of the world’s science. They were the aristocrats of technical knowledge, presiding over a centuries-old regime. They spent the most, published the most and patented the most. All good things, though, come to an end, and the reign of these scientific aristos is starting to look shaky. In 1990 they carried out more than 95% of the world’s research and development (R&D). By 2007 that figure was 76% (The Economist 2010).

The UNESCO Science Report 2010 (pdf) holds a mirror to the evolving status of science. It shows in particular how, while the disparities between countries and regions remain huge, the proliferation of digital information and communication technologies is increasingly modifying the global picture. By making codified information accessible worldwide, it is having a dramatic effect on the creation, accumulation and dissemination of knowledge, while at the same time providing specialized platforms for networking by scientific communities operating at a global level.

The distribution of research and development (R&D) efforts between North and South has changed with the emergence of new players in the global economy. A bipolar world in which science and technology (S&T) were dominated by the Triad made up of the European Union, Japan and the USA is gradually giving way to a multipolar world, with an increasing number of public and private research hubs spreading across North and South. Early and more recent newcomers to the S&T arena, including the Republic of Korea, Brazil, China or India, are creating a more competitive global environment by developing their capacities in the industrial, scientific and technological spheres.

3. An evolutionary model of the geography of innovation

In their paper The Aims and Scope of Evolutionary Economic Geography, Ron Boschma and Ron Martin (2010) argue that the concepts and ideas from evolutionary economics (and evolutionary thinking more broadly) help interpret and explain how the economic landscape changes over historical time, but also help to reveal how situating the economy in space adds to our understanding of the processes that drive economic evolution, that is to say, to demonstrate how geography matters in determining the nature and trajectory of evolution of the economic system. They argue that evolutionary economic geography is concerned with the spatialities of economic novelty; with how the spatial structures of the economy emerge from the micro-behaviours of economic agents; with how, in the absence of central coordination or direction, the economic landscape exhibits self-organisation; and with how the processes of path creation and path dependence interact to shape geographies of economic development and transformation, and why and how such processes may themselves be place dependent.

Heimeriks and Boschma <pdf> explore the worldwide spatial evolution of scientific knowledge production in biotechnology in the period 1986-2008. They employ a new methodology that identifies new key topics in biotech on the basis of frequent use of title worlds in major biotech journals as an indication of new cognitive developments within this scientific field. The analyses show that biotech is subject to a path- and place-dependent process of knowledge production with a high degree of re-occurrences of similar key topics in biotech in consecutive years. Furthermore, slow growth cities in biotech are characterized by topics that are less technologically related to other topics, while high growth cities in biotech contribute to topics that are more related to the entire set of existing topics.

Please discuss the geographical developments in knowledge production in the past decades. Does this lead to follow-up questions? Would you be able to raise a research question about the geography of science and innovation? How would you proceed systematically to research this area?

4 The Geography of Corporate Invention

The Corporate Invention Board’s website is a new tool which aims at characterizing the nature and the extent of technological globalisation. The Corporate Invention Board’s website complements the “Industrial R&D Investment Scoreboard” (produced by Institute for Prospective Technological Studies). The industrial R&D Investment Scoreboard, an annual study of the European Commission, analyzes the performances of the 2000 industrial companies (1000 based within the European Union, 1000 outside) with the most important annual R&D investments. These 2000 companies accounted for more than 430 billion euros of investment in 2008. Through patents’ statistics, the Corporate Invention Board focuses on the outputs of these R&D investments. Thus, the Corporate Invention Board provides information on technologies and on localisation of these investments.

Using the CIB, can you provide some illustrations of globalisation in different sectors and technologies?

If you wish, you can also explore Mapping (USPTO) Patent Data using Overlays of Google Maps developed by Loet Leydesdorff

5. Mapping the Geography of Science

Loet Leydesdorff and Olle Persson developed an online appendix to their paper “Mapping the Geography of Science: Distribution Patterns and Networks of Relations among Cities and Institutes,” Journal of the American Society for Information Science and Technology 61(8) (2010) 1622-1634; <pdf-version>; <html-version> where they provide (free) programs for mapping the geography of science.

The programs operate on a download of data in the standard (tagged) format at the Web-of-Science interface of the Science Citation Indices and then allow the user to make a geographic mapping of the institutional addresses and their relations using Google Earth, Google Maps, and/or Pajek. An example of an input file—to be used here below—can be found here.

In this class, we will focus on mapping the geography of all authors contributing to the publications in a single journal. Using the Web of Science, please download the full records (plus cited references as plain text) of a journal of your choice (e.g.,  Int J of Greenhouse Gas Control, Research Policy or Energy Policy) for the 2 or more years (e.g. 1998 and 2008) for further processing.

These input files then have to be named “data.txt” (DOS text file) and to be stored in the same folders (one for 1998 and one for 2008) as the programs cities1.exe and cities2.exe. The two programs are to be run sequentially with an intermediate step.

Cities1.exe is derived from isi.exe and first organizes the data into relational databases. It produces among other things a file named “cities.txt” which contains the city and country information (postcode if available) in standardized format. This file can be opened and then copy-and-pasted into the GPS encoder at http://www.gpsvisualizer.com/geocoder/. Choose the (default) Yahoo! format for the encoding.  Please note that there is a maximum of 1000 lines that can be processed at once.

Cities1.Exe will prompt the user with three questions: one can set a threshold in terms of a minimal percentage of the total set of city-names in the data or set a minimum number of occurrences. Both these options enable the user to limit the size of the network. The third question enables the user to obtain additionally the cosine-normalized data matrix. This is not advised for large matrices because of adding to the computation time. For large datasets (> 200 nodes) the computation of the co-occurrence matrix may also be time consuming. One then can interrupt the program and use the file “matrix.txt” in Pajek for the construction of the (necessary!) co-occurrence matrix. I’ll explain below how to do this, but let me first focus on the next steps in the main line of the process. (The fourth option in cities1.exe enables the user to turn off the generation of a network; only information about the nodes is provided and cities.txt is generated.)

The output of the geo-coding can be used as input into Cities2.Exe after saving the file as a DOS text file. The program prompts for the name of this file. It produces a number of ouput files in various formats:

Cities.kml and cities2.kml can be read into Google Earth and/or uploaded to a website and then be read by Google Maps. These files can also be edited. (kml is a markup language.) Furthermore, kml files can directly be visualized at public websites such as http://display-kml.appspot.com/. Cities.kml contains a standard icon; cities2.kml a smaller and transparent one. In Google Maps, one may prefer cities2.kml; Network.kml contains only the network without the nodes.

Please, provide two visualisations of the geography of science in your selected journal. How did the field develop in recent years (for example between 1998 and 2008)? Below, you’ll find some additional options for further processing of your data.

Inp_gps.txt can be read into the GPS Visualizer at http://www.gpsvisualizer.com/map_input?form=data. Change the following parameters:

a)      Change “waypoints” into “default” underneath the screen input;

b)      Change “Colorize using this field” into “custom field”;

c)    Change “Resize using this field” into “custom field” and “custom resizing field” into “n”.
The resulting file contains both the nodes and the links. It may take the browser some time to load it. (If IE gives an error message, try Firefox.)
Networked nodes are (default) in red, not-connected ones in orange. One can save the file as .html and edit it for usage at one’s own website or locally. (This file can also be generated within the program BibExcel using the additional module at http://www8.umu.se/inforsk/geography/BibExcelGPSexercise.xls.)

Cities.paj can be read as a project file into Pajek for network visualization (use <F1> in Pajek); the information in this file can be combined with the file coast.net which contains coastlines based on based on the geographical coordinates of the Coast Line extractor available at the website of the National Geophysical Data Center (NGDC) at http://rimmer.ngdc.noaa.gov/mgg/coast/getcoast.html. We used the World Coast Line data designed to a scale of 1:5,000,000 for this purpose.

The files can be edited and adapted to specific usages. For example, one can change the color of nodes in inp_gps.txt or the color of the network in network.kml. The size of the nodes is set proportionate to the logarithm of its occurrences + 1 (in order to prevent the zero-values of log(1)). The value of the links is equal to the co-occurrence value, but the main diagonal values (co-occurrences within the same city) are not considered. In other words, only the lower triangle of the co-occurrence matrix is used. In the .paj file the links are considered as arcs (but this can be changed into edges).

Further processing in Pajek (ad 4)

Read the .paj file in Pajek (either under File or using <F1> on the keyboard). Read also coast.net into pajek. Pajek allows to keep both networks in a window at the screen and then one can choose under Nets the option Union of vertices. The coastline information is now combined with the network information into a new set. An example of such a complete set can be found here. The network contains now both the address information and the world map. The world map is drawn in term of edges and the network in terms of arcs; these two can therefore be manipulated independently. The full functionality of Pajek (e.g., centrality measures) remains available. Within the Draw screen of Pajek, one can zoom in by drawing a rectangular with a right-mouse click.

Further processing of the html (ad 3)

After drawing the map, click on “save your Google Map”. Use the option to view the source code in your browser and save the source code. Modify the title in line 4, the api-key in line 62, and if so wished, set the zoom to 2 in line 77. Api keys for Google Maps can freely be obtained at http://code.google.com/apis/maps/signup.html. The file will work without an api key at your local computer and with this api key at your website. See for a resulting file at http://www.leydesdorff.net/maps/is2009.html.

A faster way to generate the co-occurrence matrix

Cities1.exe  will automatically generate a co-occurrence matrix which is needed in cities2.exe  for the construction of the network. However, this procedure is time-consuming since not based on matrix algebra. (One may wish to run this routine during the night). Alternatively, the program will indicate after a while that one can interrupt using Alt-C. The user is then prompted with the option to discontinue the operation.

At that moment, a file “matrix.txt” is already generated which can be read into Pajek as a network file (File > Read > Network). The co-occurrence matrix can be made in Pajek by choosing: Net > Transform > 2-Mode to 1-Mode > Columns. Save the resulting network as a valued matrix with the .mat extension (File > Network > Save). This file should be named “pajek.mat”, and can be read by Paj2Cooc.Exe. This program generates the file coocc.dbf which is needed for cities2.exe or inst2.exe. Note that previous files with the same name are overwritten both by Pajek and by these programs.