12th April 2020

Testing for accuracy using independently created Gold Standards

Share this page:

Data science and machine learning helps us better manage and shape our portfolio; and operate more efficiently and at scale so that we can execute on our patent strategy.

Mike Lee, Director, Head of Patents, Google

Patent Classification Accuracy

Cipher’s ML technology is custom built for classifying patents, not a standard text system that’s been repurposed. We have dedicated processing for the data that makes patents unique, in order to get the most accurate classification.

One of the great things about ML algorithms is that it’s easy to test their accuracy scientifically. In order to do that we took some test data generated by the third party (Tony Trippe, Patinformatics), which was split into two parts – inside the topic, and outside the topic (but still relevant).

We then trained Cipher’s classifiers on a portion of the data and tested it against the remainder, using a process described in our paper, Construction and evaluation of gold standards for patent classification, published in World Patent Information. 

The two topics we’ve tested Cipher on are Quantum Computing Q-bit Generation, and Cannabinoid Edibles. The results are shown for both a small training set (the patent’s used to train the classifier) of 150 families, medium at 250, and large at 350 families. The results of the tests, averaged over 100 runs are:


Training set size Small Medium Large
Test Accuracy
Quantum Computing 91.6% 96.8% 97.2%
Cannabinoid Edibles 91.5% 97.5% 98.7%


The definitions of the technologies are:

  • Quantum Computing Q-bit Generation: Qubit Generation for Quantum Computing refers to patents that discuss the various means of generating qubits for use in a quantum mechanics based computing system. Types of qubits included superconducting loops, topological, quantum dot based and ion-trap methods as well as others. The excluded technologies are applications, algorithms and other auxiliary aspects of quantum computing that do not mention a hardware component, and hardware for other quantum phenomena outside of qubit generation.  The test data consists of 2,282 positive example patents, and 2,801 negative examples (from adjacent technologies).
  • Cannabinoid Edibles: The positive collection covers edible items, which can include lozenges, beverages, or powders containing a cannabinoid substance that can be used directly by oral absorption, or by formulating into a foodstuff for oral consumption. Cannabinoid substances include products from Cannabis sativa, ruderalis, or indica as well as products coming from the processing of hemp including hemp seeds, fibers, or oils.   All of the records in the negative collection mention an edible item of one sort or another, and specifically a foodstuff.  The test data consists of 1,603 positive example patents, and 9,191 negative examples (from adjacent technologies).

Cipher is a pioneer in using supervised machine learning for the binary classification of patents, and our confidence comes not only from customer feedback, but from peer reviewed scientific evidence.

Download the full Academic Study PDF

Learn more about how Cipher classifies patent data into technology buckets using Machine Learning.

Insights and Events

Measuring the accuracy of AI for classifying patents – what’s the Gold Standard?

View our webinar where we discuss the Gold Standard for testing the accuracy of patent classification using AI and machine learning.
Read more

AI throws light on Automotive industry’s billion dollar intellectual property

Automotive Patent Trends is the definitive guide to who’s protecting what across electrification, autonomy and connectivity.
Read more

With Cipher you can…

Portfolio Optimisation connecting cogs

Optimise your portfolio

Ensure you have the right portfolio to meet your strategic patenting objectives.

Read more
Competitive Intelligence

Gather competitor intelligence

Understand who’s doing what by automating patent to technology mapping.

Read more
Cross Licensing

Model cross licensing

Combine patent and revenue data to determine rational licensing outcomes.

Read more

Manage your budget

Justify patent budgets and communicate the impact of your investment.

Read more

Conduct due diligence

Automate manual reviews for efficient execution of M&A and licensing transactions.

Read more

Tackle inbound patent assertion

Be prepared with evidence to create a fast and effective threat assessment.

Read more

Benchmark your portfolio

Assess your portfolio in comparison to other owners through your technology lens.

Read more

Monetise your portfolio

Identify opportunities to create value through licensing or sale of patent assets.

Read more
Technology Trends icon

Predict technology trends

Track new technologies and discover the unknown owners of future innovation.

Read more
Icon of a meter

Create Risk Mitigation Strategies

Understand the materiality of your threats to define your risk mitigation strategy.

Read more

Improve your patent strategy now

Speak to our team today.

Arrange a callback