12th April 2020
Testing for accuracy using independently created Gold Standards
Data science and machine learning helps us better manage and shape our portfolio; and operate more efficiently and at scale so that we can execute on our patent strategy.
Mike Lee, Director, Head of Patents, Google
Patent Classification Accuracy
Cipher’s ML technology is custom built for classifying patents, not a standard text system that’s been repurposed. We have dedicated processing for the data that makes patents unique, in order to get the most accurate classification.
One of the great things about ML algorithms is that it’s easy to test their accuracy scientifically. In order to do that we took some test data generated by the third party (Tony Trippe, Patinformatics), which was split into two parts – inside the topic, and outside the topic (but still relevant).
We then trained Cipher’s classifiers on a portion of the data and tested it against the remainder, using a process described in our paper, Construction and evaluation of gold standards for patent classification, published in World Patent Information.
The two topics we’ve tested Cipher on are Quantum Computing Q-bit Generation, and Cannabinoid Edibles. The results are shown for both a small training set (the patent’s used to train the classifier) of 150 families, medium at 250, and large at 350 families. The results of the tests, averaged over 100 runs are:
|Training set size||Small||Medium||Large|
The definitions of the technologies are:
- Quantum Computing Q-bit Generation: Qubit Generation for Quantum Computing refers to patents that discuss the various means of generating qubits for use in a quantum mechanics based computing system. Types of qubits included superconducting loops, topological, quantum dot based and ion-trap methods as well as others. The excluded technologies are applications, algorithms and other auxiliary aspects of quantum computing that do not mention a hardware component, and hardware for other quantum phenomena outside of qubit generation. The test data consists of 2,282 positive example patents, and 2,801 negative examples (from adjacent technologies).
- Cannabinoid Edibles: The positive collection covers edible items, which can include lozenges, beverages, or powders containing a cannabinoid substance that can be used directly by oral absorption, or by formulating into a foodstuff for oral consumption. Cannabinoid substances include products from Cannabis sativa, ruderalis, or indica as well as products coming from the processing of hemp including hemp seeds, fibers, or oils. All of the records in the negative collection mention an edible item of one sort or another, and specifically a foodstuff. The test data consists of 1,603 positive example patents, and 9,191 negative examples (from adjacent technologies).
Cipher is a pioneer in using supervised machine learning for the binary classification of patents, and our confidence comes not only from customer feedback, but from peer reviewed scientific evidence.
Download the full Academic Study PDF
Learn more about how Cipher classifies patent data into technology buckets using Machine Learning.
Insights and Events
AI throws light on Automotive industry’s billion dollar intellectual property
With Cipher you can…
Optimise your portfolio
Ensure you have the right portfolio to meet your strategic patenting objectives.
Gather competitor intelligence
Understand who’s doing what by automating patent to technology mapping.
Model cross licensing
Combine patent and revenue data to determine rational licensing outcomes.
Conduct due diligence
Automate manual reviews for efficient execution of M&A and licensing transactions.
Tackle inbound patent assertion
Be prepared with evidence to create a fast and effective threat assessment.
Benchmark your portfolio
Assess your portfolio in comparison to other owners through your technology lens.
Monetise your portfolio
Identify opportunities to create value through licensing or sale of patent assets.
Predict technology trends
Track new technologies and discover the unknown owners of future innovation.
Create Risk Mitigation Strategies
Understand the materiality of your threats to define your risk mitigation strategy.