3. Exploration of patent databases

In addition to scientometric data, patents provide a wealth of information to measure innovative developments. Like scientometric data, patent data are not without shortcomings. This lesson focuses on the difficulties involved in formulating an analytically meaningful conceptualization of technology. Furthermore, we look into recent research using patent data as indicators of technological activity. The conceptual and methodological problems of ‘measuring’ technology are discussed, with a classification of the types of information which can be drawn from patent databases of both innovations and the innovative efforts of firms and countries. The findings and the methodological strengths and weaknesses of such studies are reviewed.

Core concepts: patents as indicators of innovation

The dynamic capabilities model as defined by Teece (1994, 1997) analyzes the sources and methods of wealth creation and capture by private enterprise firms operating in environments of rapid technological change. The competitive advantage of firms is seen as resting on distinctive processes (ways of coordinating and combining), shaped by the firm’s (specific) asset positions (such as the firm’s portfolio of difficult-to-trade knowledge assets and complementary assets), and the evolution path(s) it has adopted or inherited.

Archibigu and Pianta (1996) provide an overview of research using innovation surveys and patent data as indicators of technological activity. The conceptual and methodological problems of ‘measuring’ technology are discussed, with a classification of the types of information which can be drawn from patent databases and from surveys of both innovations and the innovative efforts of firms. The findings and the methodological strengths and weaknesses of such studies are reviewed, considering first the evidence at the firm level, second the analysis of the industrial structure and finally the evidence at the country level and the process of globalization. The overview shows that rich and important evidence on the technological activities of firms is offered by these indicators. A summary of new departures for research based on innovation and patent data concludes the paper.

OECD Patent Statistics Manual (2009) provides basic information about patent data used in the measurement of science and technology (S&T), the construction of indicators of technological activity, as well as guidelines for the compilation and interpretation of patent indicators. Alongside other science and technology indicators, such as R&D expenditure and personnel, innovation survey data, etc., patents provide a uniquely detailed source of information on inventive activity. Patent data complement other S&T data, and it is generally good to use several types of data in conjunction (R&D, innovation, patents) as a means of cross-validation and to help in interpretation. These indicators have their strengths and weaknesses; they also reflect various stages in the innovation process. This manual is part of the “Frascati” family of OECD manuals, which includes the Frascati Manual on R&D, the Oslo Manual on innovation, the Technology Balance of Payments (TBP) Manual, and the Canberra Manual on human resources for science and technology.

Additional literature: the socio-economic implications of patents

Ove Granstrand describes the use of property rights to induce innovations of various kinds as the oldest institutional arrangement that is particular to innovation as a social phenomenon. It is then customary to refer to these rights as intellectual property rights (IPRs), comprising old types of rights such as patents for inventions, trade secrets, copyrights, trade marks and design rights, together with newer ones such as breeding rights, maskwork rights and database rights. These rights – although subsumed under the label IPRs, suggesting some coherence – in fact comprise a very heterogeneous set of rights with fragmented historical developments, hardly constituting what could be called an IPR system.

The “tragedy of the commons” metaphor helps explain why people overuse shared resources. However, the recent proliferation of intellectual property rights in biomedical research suggests a different tragedy, an “anticommons” in which people underuse scarce resources because too many owners can block each other. Privatization of biomedical research must be more carefully deployed to sustain both upstream research and downstream product development. Otherwise, more intellectual property rights may lead paradoxically to fewer useful products for improving human health. Further reading at (http://www.sciencemag.org/content/280/5364/698.short)

Core empirical analysis: patent data

Patent statistics are often based on using the database of the US Patent and Trade Office (USPTO) because this database is believed to provide us with a window on the remainder of the world: most companies will patent important inventions also in the USA . However, in parallel to the USPTO database, we also have the database of the European Patent Office (EPO), and the national patent databases. (Among them, the Nederlands Octrooicentrum.) Additionally, there is an international database at the World Intellectual Property Organization (WIPO).

Patent databases are official registrations and thus the sites are freely accessible. However, they are not all equally easy to use for research purposes.


A patent in USPTO can be broken down into many sections:

title (ttl)

abstract (abs)

description and specifications (spec)

claims (aclm)

These sections relate directly to the knowledge content within the patents (the “what” part), and other sections relate more to the “who”, “where”, “when” of the patent such as what company is the patent granted to (AN), what country the patentee is based (ACN), what the inventor’s name is (IN) and so on. Refer to the help section as described earlier for more examples.

In the help section click on “How to use the advanced search page” and you will see some examples of the nested quick expressions or logic operators and how they work.

Some search logic operators include: And, Andnot, Or

Query Example

CPC/Y02B10/12 AND APD/20060101->20061231

Let’s first turn to the USPTO database at http://patft.uspto.gov/netahtml/PTO/search-adv.htm. (This database can also be accessed at http://www.google.com/patents.) Click in the left column on Patents > Search > Advanced Search. Search with the following string: ttl/”renewable energy” (title includes “renewable energy”). If correct, you should have a return about 79 records. Study some of the records. Try breaking the search down into more components (ie. ttl/”geothermal energy”) and compare the results.

Extend your search to find inventors and/or assignees specifically located in the Netherlands using the corresponding search strings. Do not get disappointed with zero hits because the database is about inventions patented in the USA. Try a few other countries or in the case of the USA, use US-states as address fields. Try using different search criteria and terms. The USPTO itself provides statistics by country and by (US) patent class at http://www.uspto.gov/web/offices/ac/ido/oeip/taf/reports.htm

Espacenet at https://worldwide.espacenet.com/?locale=en_EP

B.EPO worldwide patent search engine

Almost all nations provide online access to their national patent databases. The European Patent Office provides an advanced search engine athttp://ep.espacenet.com/advancedSearch?locale=en_EP which allows you to search worldwide.

Or using similar query interface with USPTO https://worldwide.espacenet.com/?locale=en_EP

Index for field identifier for smart search in Espacenet


Patent analysis using PATSTAT

  1. Register for a free two month trial of EPO’s online PATSTAT database at https://data.epo.org/access-control/patstatsubscription.jsp
  2. Log on to the web interface of the database on at
  3. Click the database you want to enter, in this case the 2014 Autumn edition of PATSTAT will do.

A screen opens where on the left, you see all tables in the PATSTAT database, if you expand the tables, you can see all the columns in these tables. Hovering the tables and columns with your mouse will show some basic information.

On the right there are fields to enter queries (in MS SQL language), a message box and a query history, which will all be empty the first time you log on.
The query field is very important, here we can type (or paste) in the MS SQL language what information we want from the database.


You can find the table of your query result by clicking the “Table” row at the top of the interface. Then you can click download to save your result table as .csv file.


Let us try some example queries to explore the possibilities:

  1. To see all different years in which patents were filed, type or paste

[SELECT DISTINCT appln_filing_year FROM tls201_appln] (without the [])

And click the search icon or hit Ctrl + Enter
Click the [ Table ]  Tab in the top of the screen to see the results

As you can see, the different filing years appear in random order. To correct this we add
[ORDER BY appln_filing_year ] To the existing query.  Now the results are in chronological order.

  1. The information you just extracted is not very elaborate, and has very limited use. Suppose you would want so see the full contents of a table, say [tls901_techn_field_ipc]. Enter the query [SELECT * FROM tls901_techn_field_ipc], and you will get all the contents of the table indicating the IPC Maingroup symbol for each technological sector and technological field. (the ‘*’ basically means )
  2. Let’s say we want all the information in the [tls201_appln] table on applications in applied in 1990 with the Dutch patent authority. The following query does just that: [SELECT * FROM tls201_appln WHERE appln_filing_year = 1990 AND appln_auth = ‘NL’]

You can see there are a lot of columns, go back to the [Search] tab and explore the meaning of the columns of the tls201_appln table. The [appln_nr_epodoc] field can be used to look up all the patent information in Google Patents or EPO’s Espacenet .

  1. Next, lets combine some tables to generate results we can export, download and process.
SELECT  tls201_appln.appln_id, appln_nr_epodoc, appln_filing_year, ipc_subclass_symbol , techn_field FROM tls201_appln

INNER JOIN tls209_appln_ipc  on tls201_appln.appln_id = tls209_appln_ipc.appln_id

INNER JOIN tls901_techn_field_ipc  on tls209_appln_ipc.techn_field_nr = tls901_techn_field_ipc.techn_field_nr

WHERE  appln_auth = ‘NL’

AND appln_filing_year BETWEEN ‘2000’ AND  ‘2005’

AND techn_sector LIKE ‘Electrical engineering’

GROUP BY tls201_appln.appln_id, appln_nr_epodoc, appln_filing_year, ipc_subclass_symbol, techn_field

Take a minute to look at the results:

What do the IPC (sub)classes say about a patent?

What is the meaning of technological field, and technological sector?

  1. Now click [Download] > [Prepare download..] to start the export process

Select  the following settings:    [screenshot][http://goo.gl/v39QbP]
What:                            Result table
Format:                        CSV
Range:                          All

You will see a download manager popup indicating your download is being prepared.

  1. Click the popup, or go to [Download] > [Download manager..] to see the download, and click the save icon to download the dataset.
    zip should now be downloaded.
  2. Extract the zip file and store the resulttable.csv   somewhere you can find it.

This file resulttable.csv  is also in Datafiles > resulttable.csv on the innometrics google drive: https://drive.google.com/open?id=0B3YXebfRfhx3fjB2aHNUX1UtT0JuY3A5STFaMzlOalJQTDFnamo5LTNiN2VIUFFRRDg1ZDg

Download and install R and RStudio

  1. Download and install R at https://cran.rstudio.com/
  2. Go to https://www.rstudio.com/products/RStudio/#Desktop and download and install Rstudio Desktop
  3. Open Rstudio
  4. [Optional but recommended] Familiarize yourself with R using DataCamp https://www.datacamp.com/ or TryR http://tryr.codeschool.com/

Import and analyse the dataset in R/Rstudio

  1. Download the Tutorial_patent_analysis.R file in the Scripts folder on Google Drive:


  1. Open this file in RStudio, and follow the instructions

More PATSTAT queries.

De Rassenfosse, Dernis and Boedt (2014) discuss and explain a series of examples of PATSTAT queries in their working paper [http://gder.phpnet.org/rassenfosse//paper_An_introduction_to_the_Patstat_database_with_example_queries.html]

[Note] To get these examples to work in the online PATSTAT interface little trick has to be applied, because in the example the authors define a new table (our_sample) with Query 1, which is not possible in the online interface. In the following examples they call for the table our_sample, which doesn’t exist. To fix this issue, you can replace the our_sample with the first 10 lines of query 1, put between parentheses () .

SQL will execute this subquery between parentheses first, and create a temporary table with the results of it, which is similar to what De Rassenfosse et al. do when they define a table ‘our_sample’.

In addition, pay attention to any comments regarding MS SQL, because the online PATSTAT interface recently switched from MySQL to MS SQL.

(If you’re not able to get a query to work, the results of query 7 can be found in the google drive datasets folder)

Patent Quality

The PATSTAT database, used above, contains millions of patent applications with hundreds of attributes about each patent, including applicant, inventor, their addresses, citations etc. This wealth of information can be very useful to innovation scholars, yet has a number of limitations as well. One of the more pragmatic limitations is the ‘dirtiness’ of the data. Cleaning the data is often required before meaningful analysis can be done. Yet cleaning 80 million patent applications is something single humans will not be able to do in a lifetime.

Fortunately, numerous organisations and institutions have made efforts to clean, harmonize and enrich patent data, and these data are mostly freely available to scholars.

The OECD for example maintains a number of databases: [ http://www.oecd.org/sti/inno/oecdpatentdatabases.htm  ]

  1. Take a minute to read about these databases and think of possible uses.

While anyone can request access to these datasets for research use, it may take a few months before you get access. For this course, the de OECD datasets can be accessed through Google BigQuery (A MySQL-like database)

  1. Go to https://bigquery.cloud.google.com/

When logged on you see in the left column a shared project: [innometrics-1055:patentdata] with a number of tables (similar to the PATSTAT interface). Explore the different tables a little.

  1. To lookup a company in [201202_HAN_NAME] click ‘COMPOSE QUERY’ and enter the following query:

SELECT * FROM [innometrics-1055:patentdata.201502_HAN_NAME]
WHERE Person_name_clean LIKE ‘%DAF%’ AND Person_ctry_code like ‘NL’

This query asks for (SELECT) all data (*) in the (FROM) 201501_HAN_NAME table, located in the patendata dataset, which is part of the innometrics-1055 project, where the field Person_name_clean contains DAF (the % is a wildcard), and the country of origin is ‘NL’

We see a number of results including ‘DAF TRUCKS NV’.

  1. Try some other (larger) firms and you will see that even in the harmonize applicant name database many variations will exist.

The cleaned/harmonised names were matched against company names from business register  data  (as  provided  in  the ORBIS© database from  Bureau  van  Dijk  Electronic  Publishing, June 2011). Whether or not a match was found is indicated in the Column ‘Matched’

When we have a HAN_ID (of a firm), we can use this to get a firms patent portfolio from the table 201502_HAN_PATENTS. Please note that this table is limited to EPO, USPTO and PCT patents.

  1. To see the patent portfolio of DAF TRUCKS NV we use the following query:

SELECT * FROM [innometrics-1055:patentdata.201502_HAN_PATENTS] WHERE HAN_id = 540673

(Or to include multiple HAN_id’s)

SELECT * FROM [innometrics-1055:patentdata.201502_HAN_PATENTS] WHERE HAN_id in (540673, 540666)

This returns a number of patents, which besides a quantity don’t tell us too much about DAF TRUCKS NV’s patent portfolio. With the appln_id’s we just got we can get more information. We can for example link it to IPC classes, addresses, but also to the patent quality indicator database by the OECD.

  1. Run the following query to get all the IPC 4 digit class symbols for the patent portfolio of DAF TRUCKS NV.
SELECT t1.Appln_id as appln_id, ipc_subclass_symbol

FROM [innometrics-1055:patentdata.201502_HAN_PATENTS] t1

LEFT JOIN EACH [innometrics-1055:patentdata.tls209_appln_ipc] t2 ON t1.appln_id = t2.appln_id

WHERE HAN_id = 540673

GROUP BY appln_id, ipc_subclass_symbol

  1. Click Download as CSV to export the the data. This data can be imported and visualized in R similar to the previous exercise.

If you replace DAF by a firm with thousands of patents, this result table becomes very unclear. In such cases it is useful to summarize the results. This can be done in R, even Excel, but just as easily in SQL, where we can also easily add temporal information:

SELECT t3.appln_filing_year as year, ipc_subclass_symbol,  count(distinct t1.Appln_id) as Npatents

FROM [innometrics-1055:patentdata.201502_HAN_PATENTS] t1

LEFT JOIN EACH [innometrics-1055:patentdata.tls209_appln_ipc] t2 ON t1.Appln_id = t2.appln_id

INNER JOIN EACH [innometrics-1055:patentdata.tls201_appln] t3 ON t1.Appln_id = t3.appln_id

WHERE HAN_id = 540673

GROUP BY ipc_subclass_symbol, year

This query counts the number of applications for each application, for each year as Npatents.

  1. Find a nice way to visualize the IPC technology portfolio of the firm over the years

Another OECD database is the patent quality indicators  database.

  1. Now get the 5 and 7 year citation counts of the same patent (the one of DAF):
SELECT t1.Appln_id as appln_id, fwd_cits5, fwd_cits7 FROM [innometrics-1055:patentdata.201502_HAN_PATENTS] t1
LEFT JOIN EACH [innometrics-1055:patentdata.patent_indicators_ALL] t2 ON
t1.Appln_id = t2.appln_id
WHERE HAN_id = 540673
GROUP BY appln_id, fwd_cits5, fwd_cits7

You will notice the results contain both zero’s and null values. The zero’s represent patents with zero citations in the specified period, the null’s indicate no information available. These can easily be removed by changing the LEFT JOIN EACH command to INNER JOIN EACH.

  1. Try some other firms and see how often their patents are cited.

This times cited indicator is a popular indicator to measure the ‘impact’ of a particular patent. Many factors play a role in this impact  measure, and patent analysis can be used to identify some of them. We can for example look at the highest cited patents and explore their characteristics:

SELECT t1.appln_id as appln_id, fwd_cits5, appln_filing_year, nb_applicants, nb_inventors,

GROUP_CONCAT(UNIQUE(t3.ipc_subclass_symbol)) as ipc_classes, docdb_family_id, docdb_family_size

FROM [innometrics-1055:patentdata.patent_indicators_ALL] t1

INNER JOIN EACH [innometrics-1055:patentdata.tls201_appln] t2 on t1.appln_id = t2.appln_id

INNER JOIN EACH [innometrics-1055:patentdata.tls209_appln_ipc] t3 on t1.appln_id = t3.appln_id

GROUP BY appln_id, fwd_cits5, appln_filing_year, nb_applicants, nb_inventors, docdb_family_id, docdb_family_size

ORDER BY fwd_cits5 DESC


  1. What are your observations? What is a docdb_family?

Besides the forward citations the Patent Quality Indicator database has a range of other measures.

  1. Take a look at all the fields in the patent_indicators_ALL table. All these values are explained in the paper by Squicciarini, Dernis and Criscuolo (2013) [http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=DSTI/DOC(2013)3&docLanguage=En]
  2. Try some other variables in the patent_indicators dataset and explain what they tell us.

Additional empirical analysis:  patent citations

In addition to national patent offices, the EU has developed a European Patent Office (EPO). The office of the World Intellectually Property Organization can issue so-called PCT-patents. PCT stands for Patent Collaboration Treaty. To harmonise patent processes across the world, the OECD states that a patent is a member of a patent family (such as the one above) if and only if it is filed at the European Patent Office (EPO), the Japanese Patent Office (JPO), and is granted by the US Patent & Trademark Office (USPTO) (Eurostat, 2006).

Patents contain vast amounts of technical data, consisting of information pertaining to the assignee and country of assignee amongst many data variables and the data contained is supplied on an entirely voluntary basis which makes them important if only considering the information contained within.

Patent citations

Patent citations work in much the same way as academic paper citations work except that instead of the citation being based on a voluntary scheme (such as with academic papers, where you only cite authors when you use some of their ideas etc), patent citations are added not only by the applicants of the patent, but also of the examiners of the patent application. Patents citations are determined by the examiner who, with the help of the data supplied by the applicant and their attorney, determines what citations are relevant or not.

Of course it’s not only citations within patents that can help an analysis, various other data contained within the patent documents also shed light on the subject you are investigating.

Almost all nations provide online access to their national patent databases. The European Patent Office provides an advanced search engine at http://ep.espacenet.com/advancedSearch?locale=en_EP which allows you to search worldwide. The World Intellectual Property Organization (WIPO) provides the so-called PCT patents online at http://www.wipo.int/pctdb/en/ . Only the USPTO database contains also the citation information. Note that the number of citations of a patent can increase day-by-day. Thus, it is important to note the date that you access the site.

Go to the USPTO database online at www.uspto.gov and in the left column, click on “patents”, then on “search patents”. On the left hand side of the screen, click on “advanced search” and it will take you to a basic search screen. The “query” box is where you would input various searches. Remember though that it is not a simple word search such as with Google, but the USPTO uses field codes to help narrow your search. The explanations for the various field codes can be found below the query box and if you click on any of them it will give you a more detailed description of what is involved.

Let’s do a basic search:

In the query box type “ttl/computer”. This will provide results for all patents that have the word “computer” in the title. There should be more than 25,000 search results. The first result is the newest patent granted with the search word in the title. Now if we change the search to “computer interface” as the patent title, see what you get. You need to add the quotation marks around a group of phrase.

Let’s say we’re interested in touch activated computer interfaces, if we modify the search to ttl/ “touch activated computer interface” we get 0 results. But if we now include another operator term, spec, to the search terms as such,

ttl/”computer interface” and spec/touch

we get 60+ results. Adding different search terms allows us to delve deeper into each patent. The “spec” term signifies that the search must look into the description and specifications of the patent but the same words must appear in the title. If we were to use the word “or” instead of “and” we would get over 117 000 results. This is because of the basic logic operators the search uses.

A patent can be broken down into many sections:

  • title (ttl)
  • abstract (abs)
  • description and specifications (spec)
  • claims (aclm)

These sections relate directly to the knowledge content within the patents (the “what” part), and other sections relate more to the “who”, “where”, “when” of the patent such as what company is the patent granted to (AN), what country the patentee is based (ACN), what the inventor’s name is (IN) and so on. Refer to the help section as described earlier for more examples.

In the help section click on “How to use the advanced search page” and you will see some examples of the nested quick expressions or logic operators and how they work.

Some search logic operators include:




Have a look through the “help” section on the advanced search page, and click on “tips on field searching” to familiarise yourself with some of the search language involved, and how to correctly use the nested quick expressions.

For practice, search for patents issued between January 2000 and September 2006 with the title containing the word LED but not related to flashlights that use LEDs. You should get 885 patents. Remember what your content search terms as well as your operator search terms are. Wild card operators in the USPTO database are signified by a $.

Isd/200001$->200609$ and ttl/(LED andnot flashlight) andnot spec/flashlight andnot aclm/flashlight

Now that you have some of the basics down, and you have narrowed the list of patents that you think are relevant to the research and analyses you want to perform, you can download the relevant patents to your computer. The greatest benefit of having an automated download is that you do not have to click on each patent, then save it as html, and remembering what order you saved them in. Of course in the case of you needing only ten patents, you could do that, but if you want to download 900 patents then you will regret not using an automated program.

To do this, first define your search in the advanced search page. Only when you are happy with your search terms and the expected results (sometimes you may have to cast your search in wider terms to be sure you have collected everything that is relevant, because it is easier to delete what you don’t need in your analysis than to have to repeat the search process to find all that you need).

Once your results have been returned, you will see a total number of patents returned.

My search terms were:

ttl/((blu-ray or bluray) andnot (hd-dvd or hddvd)) or abst/((blu-ray or bluray) andnot (hd-dvd or hddvd)) or spec/((blu-ray or bluray) andnot (hd-dvd or hddvd))

It returned 94 patents. (Of course, you can use other search terms.) Click on the button “next 50 hits” and the click on the 4th or 5th patent down the list. Now that you have the patent in front of you, have a look through and read the text, just to see what a patent looks like. We have the title, abstract, inventor name, assignee name, US class, references, claims, description and so on, all of which can be used as search terms mentioned earlier.

Copy the URL of the patent into a text file and read it. Here is the URL of one of the patents I asked you to search for earlier regarding LEDs.


If you examine it, you can see how the USPTO patent results come about. You can see the operator terms, the search words, the patent number and if you look at the highlighted sections, you can see which result on which page your patent was. In this case, result number 64 on page 2. These two terms are what directs the automated download program along with your search terms.

Our analysis requires downloading of the following computer programs (from http://www.leydesdorff.net/software/patentmaps/index.htm):

  1. uspto1.exe for the initial download at the interface of USPTO (advanced searching at http://patft.uspto.gov/netahtml/PTO/search-adv.htm);
  2. uspto2.exe for the organization of the databases from the retrieval under 1;
  3. patref3.exe for downloading the numbers of citations for each patent downloaded under 1; uspto2.exe generates an input file for this routine;
  4. patref5.exe for the generation of the overlays for Google Maps.

All programs have to be saved in the same folder; they are all needed although one does not need to start them as a user (but one can). Files from previous runs may be overwritten; it is advised to empty the folder before downloading and running these routines in order to prevent confusion with previous runs, etc. (The routines access the Internet using the MicroSoft Internet protocol in the file MSINet.OCX. If this file was not yet installed when installing another program, an error message may be generated by Windows since the file is not installed with the original installation of Windows. This error can be solved by following the instructions at http://www.leydesdorff.net/software/patentmaps/ocx.htm.)


The first three routines are integrated into uspto2.exe. This routine calls both uspto1.exe and patref3.exe (which access the Internet). The user first is prompted for an input string. This search string can be generated and tested at http://patft.uspto.gov/netahtml/PTO/search-adv.htm. One needs a search string which results in more than fifty patents.


For example, the search string “ttl/nano and isd/2010$$” provides 171 patents. The result

The download program, uspto1.exe, uses Visual Basic coding to send requests to the USPTO database using the search terms above, but knowing how many results are displayed on one page, it also tells the database in essence to “turn the page” when R=50 or any multiple of 50. That way, the program clicks the “next 50 hits” button so you don’t have to.

Downloading the Patents

Open a new folder and place the uspto1.exe file into it. Run uspto1.exe and paste the same URL that you looked at earlier into the indicated space. (The program works only with an address like the above one, that is, provided for patent numbers larger than the first fifty. The first fifty contain a different format.) Enter the number of patents and click run.  In case of the Error-message: “MSInet.OCX is missing”, please consult http://www.leydesdorff.net/software/patentmaps/ocx.htm

The program will now start the download from the first patent to the indicated number. Once you have your patents in the designated folder, they appear as html files with the name p1.htm, p2.htm, p3.htm etc.

Analysis of Patents

To analyse the patents we open the program uspto2.exe in the same folder as your downloaded patents. When prompted enter how many patents are to be analysed.

This program will search the html files for key words, such as assignee and title, and convert them into dBase files, which are accessible in both Excel and Access. They will be saved in the same folder as your patents and will look like this (use these files only if you dont manage to obtain the files yourself!):

We will be using Access from this point on, so open Access on your computer. Open a new file and click on “blank database”. It will ask you to save it. Save it under whatever name you choose. The next screen will give you a smaller window with tables, queries, forms etc in the left column. Go to file, then get external data then import. Navigate to the files that uspto2.exe produced and double click on each one. Make sure to click only on the .DBF files, not the .DBT files. Once these have been imported, you will see them under the “tables” section in the smaller window. If you click on any of these, it will bring up the table related to them.

dBase’s underlying file format, the .dbf file, is widely used in applications needing a simple format to store structured data. However, recent versions of MS Access (2013) do no longer support .dbf files. One can still open these files using excel, and save them in another format for further processing (e.g importing them into MS Access).

Look at all of them and see what data each table holds. For example, clicking on the TI file brings up the patent number, the title, year, date, abstract application number etc. (One can use the titles for drawing a semantic map using ti.exe.) If you click on the USCLASS table, it brings up the technological class in which the patent was granted. (With a bit of creativity in database management, you can also export the classes so that you can draw a cosine-based map among them.) The numbers alongside signify if it is the original class (1) or cross-reference class (2-). If you right-click on any “1”, then click “filter by selection” it only shows records with 1 in that field, so showing the original classes of all the patents. As there are different tables, each containing different fields of interest, we need to link them to make sense of them. To do this, go to the main window (F11) and click on the “relationships button” on the main menu. A smaller window should open that asks you which tables you want to add together. Highlight each table you want, then click “add”. In our case, let’s say we want to see the assignee name, the patent number, what primary class the patent is in, what country the assignee calls home and what year the patent was issued. So highlight TI, ASS, USCLASS and INV then click “add”. Each table now pops up on the working window. You can see that each has a scroll down list of what characteristics it has inside. Since we want to show what class, country, assignee and year our patents belong too, we need to link the tables using a unique identifier. Our unique identifier is the order in which it was downloaded, as it is the same for all the tables. So find the “nr” in each table and click and drag it to the “nr” in a different table. It will ask you to create a relationship. Click yes. An example is as shown:

Figure 1. Relationship window in Access

Once you’ve done that, close the relationship window and save it when prompted.

Now click on Queries in the main window, and create query in design window. It will now show the same window as for the relationship one. Highlight the tables that have the relevant data then click add. The tables will appear in a grey window and then click and drag the relevant sections from each table that you want to appear.

Figure 2. Query window with fields of interest.

Once you have that, click on the red exclamation mark at the top of the screen to display all your results.

From the results that come up, you can see that some patents are shown more than once, but this is due to the patent belonging to many different US classes, so right-click on the 1 in the CLASSNR field and exclude the other results. You may still see some duplicate records but these are due to the way Access treats each record. To change this, go to the query work window by clicking on the view icon in the main menu (it looks like a pencil and protractor). Then right click on the grey area, then properties and change unique values to “yes”. Then click the red exclamation mark again.


Archibugi, Daniele, and Mario Planta. “Measuring technological change through patents and innovation surveys.” Technovation 16.9 (1996): 451-519.

Granstrand, Ove. “Innovation and intellectual property rights.” The Oxford handbook of innovation (2005): 266-290.

Heller, Michael A., and Rebecca S. Eisenberg. “Can patents deter innovation? The anticommons in biomedical research.” Science 280.5364 (1998): 698-701.

Zuniga, P., Guellec, D., Dernis, H., Khan, M., Okazaki, T., & Webb, C. (2009). OECD patent statistics manual.