Close
Contact us
Close

12th April 2020

Testing for accuracy using independently created Gold Standards

Share this page:

Data science and machine learning helps us better manage and shape our portfolio; and operate more efficiently and at scale so that we can execute on our patent strategy.

Mike Lee, Director, Head of Patents, Google

Cipher’s ML technology is custom built for classifying patents, not a standard text system that’s been repurposed. We have dedicated processing for the data that makes patents unique, in order to get the most accurate classification.

One of the great things about ML algorithms is that it’s easy to test their accuracy scientifically. In order to do that we took some test data generated by the third party (Tony Trippe, Patinformatics), which was split into two parts – inside the topic, and outside the topic (but still relevant).

We then trained Cipher’s classifiers on a portion of the data, and tested it against the remainder, using a process described in this paper.

The two topics we’ve tested Cipher on are Quantum Computing Q-bit Generation, and Cannabinoid Edibles. The results are shown for both a small training set (the patent’s used to train the classifier) of 150 families, medium at 250, and large at 350 families. The results of the tests, averaged over 100 runs are:

 

Training set sizeSmallMediumLarge
TestAccuracy
Quantum Computing91.6%96.8%97.2%
Cannabinoid Edibles91.5%97.5%98.7%

 

The definitions of the technologies are:

  • Quantum Computing Q-bit Generation: Qubit Generation for Quantum Computing refers to patents that discuss the various means of generating qubits for use in a quantum mechanics based computing system. Types of qubits included superconducting loops, topological, quantum dot based and ion-trap methods as well as others. The excluded technologies are applications, algorithms and other auxiliary aspects of quantum computing that do not mention a hardware component, and hardware for other quantum phenomena outside of qubit generation.  The test data consists of 2,282 positive example patents, and 2,801 negative examples (from adjacent technologies).
  • Cannabinoid Edibles: The positive collection covers edible items, which can include lozenges, beverages, or powders containing a cannabinoid substance that can be used directly by oral absorption, or by formulating into a foodstuff for oral consumption. Cannabinoid substances include products from Cannabis sativa, ruderalis, or indica as well as products coming from the processing of hemp including hemp seeds, fibers, or oils.   All of the records in the negative collection mention an edible item of one sort or another, and specifically a foodstuff.  The test data consists of 1,603 positive example patents, and 9,191 negative examples (from adjacent technologies).

Cipher is a pioneer in using supervised machine learning for the binary classification of patents, and our confidence comes not only from customer feedback, but from peer reviewed scientific evidence.

Download the full Academic Study PDF

With Cipher you can…

Optimise your portfolio

Ensure you have the right portfolio to meet your strategic patenting objectives.

Read more

Gather competitor intelligence

Understand who’s doing what by automating patent to technology mapping.

Read more

Model cross licensing

Combine patent and revenue data to determine rational licensing outcomes.

Read more

Manage your budget

Justify patent budgets to CFOs and others to communicate the impact of your investment.

Read more

Conduct due diligence

Automate manual reviews for efficient execution of M&A and licensing transactions.

Read more

Tackle patent assertion

Be prepared with evidence to create a fast and effective threat assessment.

Read more

Benchmark your portfolio

Assess your portfolio in comparison to other owners through your technology lens.

Read more

Monetise your portfolio

Identify opportunities to create value through licensing or sale of patent assets.

Read more

Improve your patent strategy now

Speak to one of our Cipher team today.