MMI Lesson 2: Exploration of patent databases

2.1. Exploration of patent databases

In addition to scientometric data, patents and surveys provide a wealth of information to measure innovative developments. Like scientometric analyses, patentometric and survey based analysis are not without shortcomings.

Archibigu and Pianta (1996) provide an overview of recent research using innovation surveys and patent data as indicators of technological activity. The conceptual and methodological problems of ‘measuring’ technology are discussed, with a classification of the types of information which can be drawn from patent databases and from surveys of both innovations and the innovative efforts of firms. The findings and the methodological strengths and weaknesses of such studies are reviewed, considering first the evidence at the firm level, second the analysis of the industrial structure and finally the evidence at the country level and the process of globalization. The overview shows that rich and important evidence on the technological activities of firms is offered by these indicators. A summary of new departures for research based on innovation and patent data concludes the paper.

OECD Patent Statistics Manual (2009) provides basic information about patent data used in the measurement of science and technology (S&T), the construction of indicators of technological activity, as well as guidelines for the compilation and interpretation of patent indicators. Alongside other science and technology indicators, such as R&D expenditure and personnel, innovation survey data, etc., patents provide a uniquely detailed source of information on inventive activity. Patent data complement other S&T data, and it is generally good to use several types of data in conjunction (R&D, innovation, patents) as a means of cross-validation and to help in interpretation. These indicators have their strengths and weaknesses; they also reflect various stages in the innovation process. This manual is part of the “Frascati” family of OECD manuals, which includes the Frascati Manual on R&D, the Oslo Manual on innovation, the Technology Balance of Payments (TBP) Manual, and the Canberra Manual on human resources for science and technology.

Please use 6 slides to describe the use of patent indicators; how they came into being, what do they measure, what are the shortcomings? 

Sahal (1981) clarifies some of the difficulties involved in formulating an analytically meaningful conceptualization of technology. Over the years two main viewpoints of technology have claimed a degree of acceptance. First, there is the production function concept of technology originating from the neoclassical economic theory of growth and capital. Secondly, there is what might be termed the Pythagorian view of technology in terms of the patent statistics, frequency of publications and the like. Sahal outlines a third, emerging system’s concept of technology according to which innovation is best conceived in terms of its functional properties. That is, a technology is as a technology does.

Further reading on the history and development of IPR is provided by Granstrand. Ove Granstrand describes the use of property rights to induce innovations of various kinds as the oldest institutional arrangement that is particular to innovation as a social phenomenon. It is then customary to refer to these rights as intellectual property rights (IPRs), comprising old types of rights such as patents for inventions, trade secrets, copyrights, trade marks and design rights, together with newer ones such as breeding rights, maskwork rights and database rights. These rights – although subsumed under the label IPRs, suggesting some coherence – in fact comprise a very heterogeneous set of rights with fragmented historical developments, hardly constituting what could be called an IPR system.

The “tragedy of the commons” metaphor helps explain why people overuse shared resources. However, the recent proliferation of intellectual property rights in biomedical research suggests a different tragedy, an “anticommons” in which people underuse scarce resources because too many owners can block each other. Privatization of biomedical research must be more carefully deployed to sustain both upstream research and downstream product development. Otherwise, more intellectual property rights may lead paradoxically to fewer useful products for improving human health. Further reading at (http://www.sciencemag.org/content/280/5364/698.short)

Please provide some interesting insights from these articles.

Patent statistics are often based on using the database of the US Patent and Trade Office (USPTO) because this database is believed to provide us with a window on the remainder of the world: most companies will patent important inventions also in the USA (Jaffe & Trajtenberg, 1982). However, in parallel to the USPTO database, we also have the database of the European Patent Office (EPO), and the national patent databases (Sheu et al., 2006). (Among them, the Nederlands Octrooicentrum.) Additionally, there is an international database at the World Intellectual Property Organization (WIPO).

Patent databases are official registrations and thus the sites are freely accessible. However, they are not all equally easy to use for research purposes.

Please discuss two research articles that are based on patent analysis.

1.1 USPTO

Let’s first turn to the USPTO database at http://patft.uspto.gov/netahtml/PTO/search-adv.htm. (This database can also be accessed at http://www.google.com/patents.) Click in the left column on Patents > Search > Advanced Search. Search with the following string: ttl/”renewable energy” (title includes “renewable energy”). If correct, you should have a return about 21 records. Study some of the records. Try breaking the search down into more components (ie. ttl/”geothermal energy”) and compare the results.

Extend your search to find inventors and/or assignees specifically located in the Netherlands using the corresponding search strings. Do not get disappointed with zero hits because the database is about inventions patented in the USA. Try a few other countries or in the case of the USA, use US-states as address fields. Try using different search criteria and terms. The USPTO itself provides statistics by country and by (US) patent class at http://www.uspto.gov/web/offices/ac/ido/oeip/taf/reports.htm .

2.2 EPO and WIPO

Let’s repeat the exercise at the European Patent Office and World Intellectual Property Organization databases.

The EPO database can be found at http://ep.espacenet.com/ . Always use the advanced options for bibliometric searching. This time we can find approximately 250 records with “renewable energy” in the title (using the search option “Worldwide”). Can you explain the difference? Would you know a way to refine your searches at the EPO? (I was not able to find it.) Explore also the option of “Classification search”.

The WIPO  database can be found at http://www.wipo.int . On the search page for patents you can find the “Field codes” (that is, the searchable terms) on the upper right side. Repeat the searches which you did above for the USPTO database and compare the results.

What are your conclusions from comparing these three databases? Consider also the advantages and disadvantages of using http://www.freepatentsonline.com/search.html as an alternative? How many patents can you find for ttl/“renewable energy” using this database? Can you explain the difference?

2.3. Corporate Invention

The Corporate Invention Board’s website provides a new tool which aims at characterizing the nature and the extent of technological globalisation. It gives the possibility to track and analyze the transformation of global patents portfolio of industrial groups overtime. It also identifies the geographic origin of patents’ protected inventions.

The Corporate Invention Board’s website complements the “Industrial R&D Investment Scoreboard” (produced by Institute for Prospective Technological Studies). The industrial R&D Investment Scoreboard, an annual study of the European Commission, analyzes the performances of the 2000 industrial companies (1000 based within the European Union, 1000 outside) with the most important annual R&D investments. These 2000 companies accounted for more than 430 billion euros of investment in 2008. Through patents’ statistics, the Corporate Invention Board focuses on the outputs of these R&D investments. Thus, the Corporate Invention Board provides information on technologies and on localisation of these investments.

The Corporate Invention Board’s website is built on an original database which combines information extracted from the “Patstat” patent database and from the “Orbis” financial database (details available on the Methodology pages and Data sources).

2.4. National patent portfolios

Loet Leydesdorff created the following picture for the patent portfolio of China in 2005.

This picture is based on the International Classification Codes retrieved at the WIPO database using 1128 patents. The file contains 83 classification codes of which 65 are related. The pattern is shown in this picture. The nodes are sized in accordance with the logarithm of the number of patents in the corresponding category.

Try to generate this picture using the input file which is available at http://www.leydesdorff.net/wipo/china.txt . This is an input file for Pajek.

2.5 Patents and patent citations

Patents and patent citations have been used by many authors to shed light on the innovative processes and products resulting from years of research and development within a firm or institutional setting. There is much data to be found within patents and patent citations, that may help a researcher analyse various inputs and outputs by analysing the patents granted to these firms and institutions. But first, in order to recognise what data is important to a researcher, we must look at what a patent is exactly.

Patents are, in very basic terms, the right to appropriate returns from research (Reitzig 2004). They, in effect, exclude other firms from practicing or producing the same processes and products. A patent delineates a piece of knowledge, by placing, in writing, the knowledge contained within the claims and descriptions within the patent document into a legal realm where the knowledge is protected by law against infringement. In order for a patent to be granted, the knowledge contained within the claim must be novel, inventive, industrially applicable, and useful. The United States Patent and Trademark Office (USPTO) give this definition of what a patent is:

The right conferred by the patent grant is, in the language of the statute and of the grant itself, “the right to exclude others from making, using, offering for sale, or selling” the invention in the United States or “importing” the invention into the United States. What is granted is not the right to make, use, offer for sale, sell or import, but the right to exclude others from making, using, offering for sale, selling or importing the invention. Once a patent is issued, the patentee must enforce the patent without aid of the USPTO. [1]

In addition to national patent offices, the EU has developed a European Patent Office (EPO). The office of the World Intellectually Property Organization can issue so-called PCT-patents. PCT stands for Patent Collaboration Treaty. To harmonise patent processes across the world, the OECD states that a patent is a member of a patent family (such as the one above) if and only if it is filed at the European Patent Office (EPO), the Japanese Patent Office (JPO), and is granted by the US Patent & Trademark Office (USPTO) (Eurostat, 2006).

Patents contain vast amounts of technical data, consisting of information pertaining to the assignee and country of assignee amongst many data variables and the data contained is supplied on an entirely voluntary basis which makes them important if only considering the information contained within (Hall 2000). When considering the usefulness of patents in an analysis, it is important to note that the sheer number of patents is less important due to the fact that the value of a patent may vary widely. A simple patent count may be used to adjudge a firm’s R&D spending during that period but numerous studies have shown that simple patent counts do not provide good indicators for much more than previously mentioned (Trajtenberg 1990). Patents have been used to illustrate the value of a technology but with limited success due to the degree of variance of the economic importance and value derived from the patents themselves (Trajtenberg 1990). Valuable data may be gathered from the patent itself, not only from the information pertaining to the art itself, but also that any references to another patent provide a wider sense of the state of the art when related to a specific technology and the innovation within the specific field (Archibugi and Pianta 1996). In a study of the perceived value of patents, Harhoff et al (Harhoff, Narin et al. 1999) conducted a study in which they found that the greater the number of times a patent was cited, the greater the economical worth of the patent, which leads us to discuss patent citations.

Patent citations

Patent citations work in much the same way as academic paper citations work except that instead of the citation being based on a voluntary scheme (such as with academic papers, where you only cite authors when you use some of their ideas etc), patent citations are added not only by the applicants of the patent, but also of the examiners of the patent application. Patents citations are determined by the examiner who, with the help of the data supplied by the applicant and their attorney, determines what citations are relevant or not (Leydesdorff 2006a).  With these citations one can map, just as we did with the author and journal citations, the progress in a sense of the knowledge contained within the patent document. Trajtenberg (1990) argued that the number of citations of an individual patent was important and included a quote from The Office of Technology Assessment and Forecast in 1976 to demonstrate this.

…During the examination process, the examiner searches the pertinent portion of the “classified” patent file. His purpose is to identify any prior disclosures of technology …which might anticipate the claimed invention and preclude the issuance of a patent; which might be similar to the claimed invention and limit the scope of the patent protection….;or which, generally, reveal the state of the art of the technology to which the invention is directed…If such documents are found they are made known to the inventor and are “cited” in any patent which matures from the application…Thus, the number of times a patent document is cited may be a measure of its technological significance.

The number of citations a patent has can also been seen to be linked to the market value of the company owning the patent and the value of the technology (Hall, Jaffe et al. 2005).What we see is that it’s not just what patents cite yours, but also how many, which may determine the eventual value status of your patent and accordingly, your product.

Patent citation analysis is a recent development which uses bibliometric techniques to analyse the wealth of patent citation information. Karki (1997) describes the various facets of patent citations and patent citation studies, and their important applications. Construction of technology indicators being an important use of patent citations, various patent citation based technological indicators and their applications are also described.

Of course it’s not only citations within patents that can help an analysis, various other data contained within the patent documents also shed light on the subject you are investigating.

Almost all nations provide online access to their national patent databases. The European Patent Office provides an advanced search engine at http://ep.espacenet.com/advancedSearch?locale=en_EP which allows you to search worldwide. The World Intellectual Property Organization (WIPO) provides the so-called PCT patents online at http://www.wipo.int/pctdb/en/ . Only the USPTO database contains also the citation information. Note that the number of citations of a patent can increase day-by-day. Thus, it is important to note the date that you access the site.

Exercise

Go to the USPTO database online at www.uspto.gov and in the left column, click on “patents”, then on “search patents”. On the left hand side of the screen, click on “advanced search” and it will take you to a basic search screen. The “query” box is where you would input various searches. Remember though that it is not a simple word search such as with Google, but the USPTO uses field codes to help narrow your search. The explanations for the various field codes can be found below the query box and if you click on any of them it will give you a more detailed description of what is involved.

Let’s do a basic search:

In the query box type “ttl/computer”. This will provide results for all patents that have the word “computer” in the title. There should be more than 25,000 search results. The first result is the newest patent granted with the search word in the title. Now if we change the search to “computer interface” as the patent title, see what you get. You need to add the quotation marks around a group of phrase.

Let’s say we’re interested in touch activated computer interfaces, if we modify the search to ttl/ “touch activated computer interface” we get 0 results. But if we now include another operator term, spec, to the search terms as such,

ttl/”computer interface” and spec/touch

we get 36+ results. Adding different search terms allows us to delve deeper into each patent. The “spec” term signifies that the search must look into the description and specifications of the patent but the same words must appear in the title. If we were to use the word “or” instead of “and” we would get over 117 000 results. This is because of the basic logic operators the search uses.

A patent can be broken down into many sections:

  • title (ttl)
  • abstract (abs)
  • description and specifications (spec)
  • claims (aclm)

These sections relate directly to the knowledge content within the patents (the “what” part), and other sections relate more to the “who”, “where”, “when” of the patent such as what company is the patent granted to (AN), what country the patentee is based (ACN), what the inventor’s name is (IN) and so on. Refer to the help section as described earlier for more examples.

In the help section click on “How to use the advanced search page” and you will see some examples of the nested quick expressions or logic operators and how they work.

Some search logic operators include:

And

Andnot

Or

Have a look through the “help” section on the advanced search page, and click on “tips on field searching” to familiarise yourself with some of the search language involved, and how to correctly use the nested quick expressions.

For practice, search for patents issued between January 2000 and September 2006 with the title containing the word LED but not related to flashlights that use LEDs. You should get 885 patents. Remember what your content search terms as well as your operator search terms are. Wild card operators in the USPTO database are signified by a $.

Isd/200001$->200609$ and ttl/(LED andnot flashlight) andnot spec/flashlight andnot aclm/flashlight

Now that you have some of the basics down, and you have narrowed the list of patents that you think are relevant to the research and analyses you want to perform, you can download the relevant patents to your computer. The greatest benefit of having an automated download is that you do not have to click on each patent, then save it as html, and remembering what order you saved them in. Of course in the case of you needing only ten patents, you could do that, but if you want to download 900 patents then you will regret not using an automated program.

To do this, first define your search in the advanced search page. Only when you are happy with your search terms and the expected results (sometimes you may have to cast your search in wider terms to be sure you have collected everything that is relevant, because it is easier to delete what you don’t need in your analysis than to have to repeat the search process to find all that you need).

Once your results have been returned, you will see a total number of patents returned.

My search terms were:

ttl/((blu-ray or bluray) andnot (hd-dvd or hddvd)) or abst/((blu-ray or bluray) andnot (hd-dvd or hddvd)) or spec/((blu-ray or bluray) andnot (hd-dvd or hddvd))

It returned 94 patents. (Of course, you can use other search terms.) Click on the button “next 50 hits” and the click on the 4th or 5th patent down the list. Now that you have the patent in front of you, have a look through and read the text, just to see what a patent looks like. We have the title, abstract, inventor name, assignee name, US class, references, claims, description and so on, all of which can be used as search terms mentioned earlier.

Copy the URL of the patent into a text file and read it. Here is the URL of one of the patents I asked you to search for earlier regarding LEDs.

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=64&f=G&l=50&d=PTXT&s1=(((%40PD%3E%3D20000100%3C%3D20060931+AND+(LED+NOT+flashlight).TI.)+NOT+(flashlight.BSUM.+or+flashlight.DETD.+or+flashlight.DRWD.))+NOT+(flashlight.CLTX.+or+flashlight.DCTX.))&p=2&OS=Isd/200001$->200609$+and+ttl/(LED+andnot+flashlight)+andnot+spec/flashlight+andnot+aclm/flashlight&RS=(((ISD/200001$->200609$+AND+TTL/(LED+ANDNOT+flashlight))+ANDNOT+SPEC/flashlight)+ANDNOT+ACLM/flashlight)

If you examine it, you can see how the USPTO patent results come about. You can see the operator terms, the search words, the patent number and if you look at the highlighted sections, you can see which result on which page your patent was. In this case, result number 64 on page 2. These two terms are what directs the automated download program along with your search terms. The download program, uspto1.exe, uses Visual Basic coding to send requests to the USPTO database using the search terms above, but knowing how many results are displayed on one page, it also tells the database in essence to “turn the page” when R=50 or any multiple of 50. That way, the program clicks the “next 50 hits” button so you don’t have to.

Downloading the Patents

Open a new folder and place the uspto1.exe file into it. Run uspto1.exe and paste the same URL that you looked at earlier into the indicated space. (The program works only with an address like the above one, that is, provided for patent numbers larger than the first fifty. The first fifty contain a different format.) Enter the number of patents and click run.  In case of the Error-message: “MSInet.OCX is missing”, please consult http://www.leydesdorff.net/software/patentmaps/ocx.htm

The program will now start the download from the first patent to the indicated number. Once you have your patents in the designated folder, they appear as html files with the name p1.htm, p2.htm, p3.htm etc.

Analysis of Patents

To analyse the patents we open the program uspto2.exe in the same folder as your downloaded patents. When prompted enter how many patents are to be analysed. (UPDATE: http://www.leydesdorff.net/software/patentmaps/index.htm provides updated tools)

This program will search the html files for key words, such as assignee and title, and convert them into dBase files, which are accessible in both Excel and Access. They will be saved in the same folder as your patents and will look like this:

We will be using Access from this point on, so open Access on your computer. Open a new file and click on “blank database”. It will ask you to save it. Save it under whatever name you choose. The next screen will give you a smaller window with tables, queries, forms etc in the left column. Go to file, then get external data then import. Navigate to the files that uspto2.exe produced and double click on each one. Make sure to click only on the .DBF files, not the .DBT files. Once these have been imported, you will see them under the “tables” section in the smaller window. If you click on any of these, it will bring up the table related to them.

Look at all of them and see what data each table holds. For example, clicking on the TI file brings up the patent number, the title, year, date, abstract application number etc. (One can use the titles for drawing a semantic map using ti.exe.) If you click on the USCLASS table, it brings up the technological class in which the patent was granted. (With a bit of creativity in database management, you can also export the classes so that you can draw a cosine-based map among them.) The numbers alongside signify if it is the original class (1) or cross-reference class (2-). If you right-click on any “1”, then click “filter by selection” it only shows records with 1 in that field, so showing the original classes of all the patents. As there are different tables, each containing different fields of interest, we need to link them to make sense of them. To do this, go to the main window (F11) and click on the “relationships button” on the main menu. A smaller window should open that asks you which tables you want to add together. Highlight each table you want, then click “add”. In our case, let’s say we want to see the assignee name, the patent number, what primary class the patent is in, what country the assignee calls home and what year the patent was issued. So highlight TI, ASS, USCLASS and INV then click “add”. Each table now pops up on the working window. You can see that each has a scroll down list of what characteristics it has inside. Since we want to show what class, country, assignee and year our patents belong too, we need to link the tables using a unique identifier. Our unique identifier is the order in which it was downloaded, as it is the same for all the tables. So find the “nr” in each table and click and drag it to the “nr” in a different table. It will ask you to create a relationship. Click yes. An example is as shown:

Figure 1. Relationship window in Access

Once you’ve done that, close the relationship window and save it when prompted.

Now click on Queries in the main window, and create query in design window. It will now show the same window as for the relationship one. Highlight the tables that have the relevant data then click add. The tables will appear in a grey window and then click and drag the relevant sections from each table that you want to appear.

Figure 2. Query window with fields of interest.

Once you have that, click on the red exclamation mark at the top of the screen to display all your results.

From the results that come up, you can see that some patents are shown more than once, but this is due to the patent belonging to many different US classes, so right-click on the 1 in the CLASSNR field and exclude the other results. You may still see some duplicate records but these are due to the way Access treats each record. To change this, go to the query work window by clicking on the view icon in the main menu (it looks like a pencil and protractor). Then right click on the grey area, then properties and change unique values to “yes”. Then click the red exclamation mark again.

Prof. Loet Leydesdorff provides additional tools for Mapping Patent Data in terms of International Patent Classes (IPC) and Mapping (USPTO) Patent Data using Overlays of Google Maps

OPTIONAL (for 1 bonus point); Using a new search query, can you provide a mapping of overlays using Google Map?

OPTIONAL (for 1 bonus point); Using a new search query, can you provide a mapping of  Patent Data in terms of International Patent Classes (IPC)?