6. The Geography of Science and Innovation

This lesson is concerned with the spatialities of science and innovation; with how the spatial structures in research and innovation emerge from the micro-behaviours of local agents; with how, in the absence of central coordination or direction, the economic landscape exhibits self-organisation; and with how the processes of path creation and path dependence interact to shape geographies of economic development and transformation, and why and how such processes may themselves be place dependent.

Concepts and ideas from evolutionary economics (and evolutionary thinking more broadly) help interpret and explain how the economic landscape changes over historical time, but also help to reveal how situating the economy in space adds to our understanding of the processes that drive economic evolution, that is to say, to demonstrate how geography matters in determining the nature and trajectory of evolution of the economic system.

Core concepts: evolutionary models of the geography of innovation

In Evolutionary Economic Geography, it is argued that the concepts and ideas from evolutionary economics (and evolutionary thinking more broadly) help interpret and explain how the economic landscape changes over historical time, but also help to reveal how situating the economy in space adds to our understanding of the processes that drive economic evolution, that is to say, to demonstrate how geography matters in determining the nature and trajectory of evolution of the economic system. Evolutionary economic geographers address the spatialities of economic novelty; how the spatial structures of the economy emerge from the micro-behaviours of economic agents; and how, in the absence of central coordination or direction, the economic landscape exhibits self-organisation; and with how the processes of path creation and path dependence interact to shape geographies of economic development and transformation, and why and how such processes may themselves be place dependent.

Heimeriks and Boschma  and Boschma, Heimeriks and Balland explore the worldwide spatial evolution of scientific knowledge production in biotechnology in the period 1986-2008. They employ a new methodology that identifies new key topics in biotech on the basis of frequent use of title worlds in major biotech journals as an indication of new cognitive developments within this scientific field. The analyses show that biotech is subject to a path- and place-dependent process of knowledge production with a high degree of re-occurrences of similar key topics in biotech in consecutive years. Furthermore, slow growth cities in biotech are characterized by topics that are less technologically related to other topics, while high growth cities in biotech contribute to topics that are more related to the entire set of existing topics.

An excellent survey of the literature is provided by David Rigby. The Papers in Evolutionary Economic Geography (PEEG) series presents a great selection of relevant articles.

  • Design a model of the innovative performance of a region based on an evolutionary model. Give an argumentation of your choices.
  • Design indicators of innovative performance according to your model. Give an argumentation of your choices.
  • Collect data and provide interpretation. What do the outcomes of your model mean?

Additional literature: Spiky World

Richard Florida shows in THE ATLANTIC MONTHLY that by almost any measure the international economic landscape is not evenly distributed at all. ‘Our world is amazingly spiky. In terms of both sheer economic activity and cutting-edge innovation, surprisingly few regions truly matter in today’s global economy’ (Florida, 2005:p.48). This uneven distribution becomes clear when measuring the clustering of people, economic activity and patents.  Population and economic activity are both uneven distributed, but it’s innovation – the engine of economic growth – that is most concentrated (Florida, 2005:p.49). According to the patent distribution only a few places in the world produce most of the innovations.

Today cities are once again on the rise. With more than half the world’s population currently living in cities—more than three billion people—and an estimated sixty million more moving to cities every year, cities are cauldrons of human creativity. According to Florida, in a world now driven by creativity and ideas, our cities drive all future innovation.

Florida says that the Creative Class is a class of workers whose job is to create meaningful new forms (2002). It is composed of scientists and engineers, university professors, poets and architects, and also includes “people in design, education, arts, music and entertainment, whose economic function is to create new ideas, new technology and/or creative content” (Florida, 2002, p. 8). The designs of this group are seen as broadly transferable and useful. Another sector of the Creative Class includes positions that are knowledge intensive; these usually require a high degree of formal education (Florida, 2002). Examples of workers in this sector are health professionals and business managers, who are considered part of the sub-group called Creative Professionals. Their primary job is to think and create new approaches to problems. Creativity is becoming more valued in today’s global society. Employers see creativity as a channel for self-expression and job satisfaction in their employees. About 38.3 million Americans and 30 percent of the American workforce identify themselves with the Creative Class. This number has increased by more than 10 percent in the past 20 years.

The Creative Class is also known for its departure from traditional workplace attire and behavior. Members of the Creative Class may set their own hours and dress codes in the workplace, often reverting to more relaxed, casual attire instead of business suits and ties. Creative Class members may work for themselves and set their own hours, no longer sticking to the 9–5 standard. Independence is also highly regarded among the Creative Class and expected in the workplace (Florida, 2002).

Scaling laws

The bigger the city, the more productive. Big city dwellers are richer, more creative and more innovative than residents of small towns. At the same time residents of the largest cities more often fall victim to crime and disease than residents of smaller towns.

Big cities are also generally more productive than small towns in doing scientific research, but the patterns are complex. Mega cities prove to be less productive than one would expect based on their size. The results appeared on October 29 in PLOS ONE.

There are many phenomena that scale with the number of inhabitants of a city. As city size increases, per capita wealth increase by approximately 15%, and per capita patent output increases by 27%. These striking patterns are referred to as scaling laws. Because of these striking patterns, it is tempting to benchmark individual cities according to their expected performance given their city size. Indeed, it has been argued that scaling laws can provide a building block for a new science of performance-based planning.

The paper shows that such an approach does not apply well to the production of scientific knowledge. Data on scientific output of U.S. cities show that the per capita number of scientific papers does indeed increase with city size and the exponent is exceptionally high. However, many mid-sized cities publish much less than would be predicted by the scaling law, while for some other exceptional “science cities,” publishing performance is much higher than would be predicted from their modest size.

In the very largest cities, the scaling values were remarkably low. That makes these mega cities relatively unattractive locations for scientific research. New York, LA, Chicago perform much less than expected from their size. This indicates that the cost of living in mega-cities do not outweigh the benefits that such mega cities could offer.

Also the difference between disciplines is large. While some disciplines (such as ‘Arts and Humanities’ and ‘Veterinary Sciences’) show low scaling values, other disciplines show strong patterns of scaling, in particular Life Sciences and Engineering. Research in these fields preferentially locates in larger cities, benefitting from the local presence of patients and companies engaged in research.

What insights can you offer on the (changing) role of cities in research and innovation? 

Core empirical analysis: 

Part 1: publication data

First we need the database file as created by SAINT  in lesson 2.

In order to visualise this data geographically, we are first going to Geocode the address data using R. For this we need the ‘ResearchAddress’ table from the mdb file. We can either export it using MS Access, or read it directly from the file using the mdb.get() function of the Hmisc package in R.

For a single table, the easiest option is exporting a table as csv or txt file from access.

This can be imported in RStudio using the ‘Import Dataset’ button.

The R file Geocoder.r (on Google Drive) contains instructions and scripts to geocode this table

The resulting file geocoded_addresses.csv can be imported in MS Access as a text file. Make sure to check the “First row contains field names” box in the import wizard and “create the primary key from the column “ID”.

A query can now create nodelists and edgelists with geographic data which can be drawn on a worldmap.

First, we create a nodelist with from the geocoded address table:

In MS access, create a new query(design),  (close the table select dialog) and go to the SQL view.

Now paste the following code where it currently says ‘SELECT;’.

SELECT Geocoded_addresses.ID, Geocoded_addresses.lat, Geocoded_addresses.lon

FROM Geocoded_addresses;

If you save the query, you can export the results to a text file. Make sure to use a dot [.] as decimal sign, otherwise Gephi will not accept the coordinates as numbers (these settings can be found in the [advanced…] menu in the export process of Access. Also make sure that you include a column name line, Gephi requires one.

Next, to create an edgelist for co-authorships, create a new query in MS acces, go to the sql view and paste:

SELECT Geocoded_addresses.ID AS Source, Geocoded_addresses_1.ID AS Target, ‘undirected’ as Type, Count(Couple_Articles_Authors_1.Articles_ID) AS Weight

FROM (Couple_Articles_ResearchAddresses_Authors AS Couple_Articles_ResearchAddresses_Authors_1 INNER JOIN ((Couple_Articles_ResearchAddresses_Authors INNER JOIN Geocoded_addresses ON Couple_Articles_ResearchAddresses_Authors.ResearchAddresses_ID = Geocoded_addresses.ID) INNER JOIN (Couple_Articles_Authors INNER JOIN Couple_Articles_Authors AS Couple_Articles_Authors_1 ON Couple_Articles_Authors.Articles_ID = Couple_Articles_Authors_1.Articles_ID) ON (Couple_Articles_ResearchAddresses_Authors.Articles_ID = Couple_Articles_Authors.Articles_ID) AND (Couple_Articles_ResearchAddresses_Authors.Authors_ID = Couple_Articles_Authors.Authors_ID)) ON (Couple_Articles_ResearchAddresses_Authors_1.Articles_ID = Couple_Articles_Authors_1.Articles_ID) AND (Couple_Articles_ResearchAddresses_Authors_1.Authors_ID = Couple_Articles_Authors_1.Authors_ID)) INNER JOIN Geocoded_addresses AS Geocoded_addresses_1 ON Couple_Articles_ResearchAddresses_Authors_1.ResearchAddresses_ID = Geocoded_addresses_1.ID

GROUP BY Geocoded_addresses.ID, Geocoded_addresses_1.ID

HAVING (((Geocoded_addresses.ID)>[Geocoded_addresses_1].[ID]));

This should create the following design-view (which can also be manually constructed off course):

Save the query and name it (for example ‘co-authorship geocoded edgelist’)

This list has the address id’s rather than the author ids as source and target, which is required for the visualisation

You will now have two text file, a nodelist and an edgelist

You can import this in Gephi in de data laboratory page. It is very important you import the ‘lat’ and ‘lon’ columns of the nodelist as numeric fields, the default is ‘string’. This is wrong, use ‘double’ as  field type. For the edgelist, the column ‘weight’ should also be numeric, type ‘double’ or  ‘float’.

In Gephi you now need the plugins ‘GeoLayout’ ‘MapOfCountries’ (and ‘ExportToEarth’ if you like to show your network on Google Earth).

Go to extra > plugins to install these plugins

You can now select GeoLayout from the lay-outs menu, and after that use map of countries to draw a map. Make sure to uncheck ‘centered’ in both of them, otherwise your network and the maps may not align.

You should now have a nice social network on a world map.

  • If you don’t like the map drawn by MapOfCountries, it is also possible to export the SVG file and use software like InkScape or Illustrator to paste the network on top of a nicer map image. Make sure you get the right projection (e.g. Mercator).

Part 2: Patent data

One of the benefits of patent data is its availability of address data. However, the addresses information in PATSTAT is ‘dirty’ data; It is not uniform, has missing values and is not always unambiguous. If you have a dataset of 100 patents, you can clean these data manually. However, if you have millions of patents, you are really happy with the REGPAT database as created and maintained by the OECD [http://www.oecd.org/sti/inno/40794372.pdf].

Besides cleaned addresses, REGPAT has assigned TL2 and TL3 regions to about 5 million patents filed with the EPO or PCT. With this information we can create more geographical representations.

Imagine you would want to know how the knowledge on LED technology is spread around europe. A quick lookup in the WIPO database shows the IPC class for LED is ‘H01L 33/00’

We can use this class in bigquery to count for each OECD TL2 region how many patens have been applied with the LED ipc class :

SELECT t3.up_reg_code as up_reg_code, count(DISTINCT t1.appln_id) as npatents

FROM [innometrics-1055:patentdata.tls209_appln_ipc] t1

INNER JOIN EACH [innometrics-1055:patentdata.regpat_all_inv] t2 on t1.appln_id = t2.Appln_id

INNER JOIN EACH [innometrics-1055:patentdata.regpat_regions] t3 on t2.Reg_code = t3.Reg_code

WHERE ipc_class_symbol LIKE “H01L%33/00%”

GROUP BY up_reg_code

(As you can see, the space in the IPC class was replaced with a wildcard (%) because matching spaces in bigquery gives unexpected behaviour, and wildcard was added after the code in case the IPC class has any subclasses.

If you download the results as csv file and import this in r-studio.

Use the file geo-vis.R and the folder ‘functions’ to make the visualisation of this data (this file, along with its required function files can be found in the scripts folder on Google Drive).

The folder ‘functions’ with its contents needs to be in your R working directory, it contains custom function definitions which are required for the map building.

For the first example you can use the plotNUTS2 function, this will create a map of Europe.

You can do the same for the USA, but you will need to get a slightly different table from bigquery. This can be obtained with the following query:

SELECT SUBSTR(t2.reg_code, 3) AS region, count(DISTINCT t1.appln_id) as npatents, t3.reg_label as label

FROM [innometrics-1055:patentdata.tls209_appln_ipc] t1

INNER JOIN EACH [innometrics-1055:patentdata.regpat_all_inv] t2 on t1.appln_id = t2.Appln_id

INNER JOIN EACH [innometrics-1055:patentdata.regpat_regions] t3 on t2.Reg_code = t3.Reg_code

WHERE ipc_class_symbol LIKE “H01L%33/00%”

AND t2.reg_code like “US%”

GROUP BY region, label

After importing it in RStudio, the data can be visualized with the plotUSTL3 function. This function will (try) to open a new window for the plot, and puts a label on the top-5 regions.

Additional analysis: globe

For the last part we will project information on a 3D world globe: https://www.chromeexperiments.com/globe .

First we need to get a proper datafile for this, the globe needs a json-file with information on latitude, longitude and peak height.  We can use an R to generate such a file, but first we need the (geocoded) data.

Any geocoded data could be used yet geocoding all patents with for example the LED ipc class has poses some difficulties. First, Google will only geocode 2600 addresses per ip-address per day, second, the graph won’t be very clear and will take long to load because there are many datapoints.  These difficulties can be addressed in a number of ways, so why not start with the simplest:

If we aggregate the data on TL3 regions, as we did in part 2, we have a limited number of addresses to geocode and can limit the number of datapoints. The geocoding of these regions can be done the same way as the research addresses in part one, but there is also a geocoded region table in BigQuery. We can use this table to modify the queries of part two to add latitude and longitude information:

SELECT t3.Reg_code as reg_code, t3.lat as lat, t3.lon as lon, count(DISTINCT t1.appln_id) as npatents

FROM [innometrics-1055:patentdata.tls209_appln_ipc] t1

INNER JOIN EACH [innometrics-1055:patentdata.regpat_all_inv] t2 on t1.appln_id = t2.Appln_id

INNER JOIN EACH [innometrics-1055:patentdata.geocoded_regions] t3 on t2.Reg_code = t3.Reg_code

WHERE ipc_class_symbol LIKE “H01L%33/00%”

AND t3.lat IS NOT NULL

GROUP BY reg_code, lat, lon

A csv export of this query can be converted into a proper datafile with the last part of the geo-vis.R file.

If you have this file, there are two ways to make a globe of this. The easy way is going to http://innometrics.hutschemaekers.com  and uploading your json file there. You will get a code and a link from which you (and others) can access the globe.

The more challenging way (some knowledge of HTML and javascript is recommended) is setting up your own globe on a web server or your own pc.

The globe.zip (on google drive) file contains all files required to make your own globe.

If you extract this archive, the index.html file in the ‘globe-search’ subfolder works with the out.json file in the same folder. Replacing this file with your datafile will make your data load in the globe.

You can either upload the contents of globe.zip to a webserver or use an application like python to run a http server on your own computer.

For (quite technical) instructions on how to do this and much more, visit http://www.html5rocks.com/en/tutorials/webgl/globe/  and http://versae.blogs.cultureplex.ca/2013/02/03/creating-a-globe-of-data-ph2/

References

Boschma, Ron, and Ron Martin. “The aims and scope of evolutionary economic geography.” The handbook of evolutionary economic geography (2010): 3-39.

Boschma, Ron, Gaston Heimeriks, and Pierre-Alexandre Balland. “Scientific knowledge dynamics and relatedness in biotech cities.” Research Policy 43.1 (2014): 107-114.

Florida, Richard. “The world is spiky. Globalization has changed the economic playing field, but hasn’t levelled it 2005: 48-51.” The Atlantic Monthly (2005).

Heimeriks, Gaston, and Ron Boschma. “The path-and place-dependent nature of scientific knowledge production in biotech 1986–2008.” Journal of Economic Geography (2013): lbs052.

Leydesdorff, Loet, and Olle Persson. “Mapping the geography of science: Distribution patterns and networks of relations among cities and institutes.”Journal of the American Society for Information Science and Technology 61.8 (2010): 1622-1634.

Rigby, David L. “Technological relatedness and knowledge space: entry and exit of US cities from patent classes.” Regional Studies  (2013): 1-16.

UNESCO science report 2010: The current status of science around the world. UNESCO Publishing, 2010.