Season 2 Episode 3
7th April 2022
Machine Learning for Patents
From chemistry to Machine Learning for patents
Chemistry was the only subject in school that I ever got a C in. Specifically organic chemistry, But I found it fascinating. And I found it a challenge. I actually synthesised the first clinical batch of the drug that went on to become Viracept.
All patent people have some sort of technical background of one type or another, and so mine just happened to be chemistry. Chemistry is a central science. I went from the lab to working with chemical information to working with patent information, working with patent analytics, patent intelligence.
Then just in the past few years, with the advent of machine learning, and the real surge of machine learning in patent related areas, I started to work along those lines as well.
The history behind ML4Patents
There seemed to be a fair amount of misinformation.
There are some vendors who basically were going around saying, “Well, look, you don’t need to read patents anymore. And all of this searching that you’ve been doing, and all this manual review, and basically what amounts to 50 years of patent related items, you just have to throw it out the window. It’s old school, it’s not the way things ought to be done anymore. [And] you should be using these machine learning based methods.”
On the flip-side, then you also had a number of for lack of a better term, old school, Boolean-based searchers, who were quite adamant that, well, you don’t know what’s going on in the black box, you can’t trust the system, you can’t evaluate the results.
The reality is somewhere in the middle. I thought that it was about time that there were resources available that took a middle ground, or at least provided an unbiased view of what was going on.
The impact of Machine Learning on patent analytics
There are activities that used to take a month, that can now be completed in a couple of days. There are activities that it would have been impossible for somebody without advanced training and exposure to very expensive databases, to be able to accomplish that can now be done in a few hours.
All of that has been driven forward by the advent of these machine learning algorithms and technologies.
The patent corpus is challenging. It’s different than written text. It’s different from other types of documents.
But now enough time has passed and enough organisations have gotten involved, that you’re seeing real headway real progress, and being able to apply what’s been learned in those other areas and taken it into the patent world.
Measuring the accuracy of Machine Learning
One of the big things that people still talk about is trust. People say, “I couldn’t possibly use a machine learning tool, because everybody says you can’t trust it, that there isn’t any transparency, that it’s a black box.” But what we attempted to do was create a gold standard collection in a couple of different technologies.
What that allows you to do is then be able to make meaningful comparisons, and do meaningful evaluations. It’s demonstrating to people that this is for real, and they should be investing more time and effort into getting involved.
Cipher is hugely grateful for the collaboration with Tony. He was capable of doing a very difficult job of reading 1,500 patents relating to qubits or cannabinoids, and putting them into piles so that we could run our algorithm against an independent test set.
Adopters of Machine Learning
Automation is coming to everybody’s jobs. and you can either look at that fearfully, or you can actually embrace it and you can get excited about the efficiencies that it creates, the opportunity it creates for you to do more value-added work.
That’s another underlying idea behind ML4Patents. The people in this industry have more to contribute than just being able to do searching or being able to create these buckets, and put documents in the piles.
There’s just so much more that they’re capable of being able to provide if they can get those really tedious, manual, time-consuming tasks off their plate.
Now we’re going through another one of those changes, where instead of relying on the process that would take them four weeks to put together valuable business insight, they can start doing that now in a much shorter time period.
It’s also opening up additional avenues for insight because more visualisation types are available, more types of analysis results are available. Then you apply that to decision making and now you start feeling much, much more confident about the direction that you’re about to take your organisation because you’re coming at it from really great data, really great analysis, and lots of great insight.
What the future holds
Within the next five years, patent searching with machine learning based tools is going to feel the same as Boolean searching, same to those people who used to search with punch cards or printed indexes.
We’re really getting there and the advent of tools like Cipher is really getting us to the point where people can have these resources available, they can access it quickly, get valuable insight quickly, and then use it for more and more of their decision making.
Message from the CEO
There’s so much scientific knowledge locked within patents, you have to ask why so few people have access to what is often described as we’ve discussed today, the largest library of scientific information in the world.
The conundrum is more puzzling when you realise that there is a profession of analysts like Tony trained to extract the insight hidden in plain sight. I think the answer lies in the reality that analysing patents has until recently been a specialist sport.
With the advent of a range of AI and machine learning technologies, this has opened up the data to those Tony would say everyone who need and need to benefit from it. This is the time as we say at Cipher to unleash the strategic value of patents as a treasure chest of information available to all of us.