In traditional patent search and landscaping, boolean search and IPC codes have played a major role. However, both have a fundamental tradeoff between completeness and accuracy (the number of patents you retrieve, and how many of those are relevant). The more words or IPC codes you include in your search, the more noise you will get back, but the more you exclude, the more relevant patents will fall out of your results. This makes doing a proper search very difficult.
Other downsides of boolean and IPC code based search
Another major problem with boolean search is that you need to consciously know what you are looking for, meaning synonyms, misspellings, alternative ways of describing the same phenomenon, etc. The odds that one is able to identify every or even most possible ways of describing a technology are not great. A way around this problem is to incorporate IPC codes, as you can retrieve many relevant patents without needing to know exactly how to describe them. However, IPC codes have their own problems. Very similar technologies can have different codes while different technologies can end up in the same code as well.
The next generation of search
With modern machine learning methods, it is no longer necessary to solely rely on boolean search or IPC codes. Search and landscaping can now be done by training an algorithm. Focus' search algorithm relies on user feedback. First, the user gives examples of patents that are similar to what the user is looking for. By giving the algorithm positive examples, it will learn what to look for. Then, the user specifies a number of patents that the algorithm should retrieve. After retrieving the specified number of patents that are most closely related to the input, the user can give feedback by specifying which are correctly retrieved and which are not. This way of working has two major benefits.
- Non-experts can do expert level searches.
- Higher search completeness with higher accuracy can be achieved at the same time.
Expert level searches for all
The only thing that is required for a great search result is the ability to recognize whether a retrieved patent is relevant to your search goal or not. There is no need to know any terms. The only thing you need to do is feed the algorithm similar patents and determine if patents retrieved by the algorithm are correct or incorrect. The algorithm will do everything else for you.
Higher completeness and accuracy at the same time
Because the algorithm will learn from your input, it forms its own definition of a relevant patent. This definition is not limited to specific words or IPC codes but is informed by our natural language processing algorithms. This means that the algorithms will identify synonyms, similar context, etc. all by itself without the user having to do so. This makes it possible to generate much more complete search results that are simultaneously less noisy. The more time you are willing to spend on training the algorithm, the less noise you will get. Depending on your search criteria a good search can be done in a matter of minutes to several hours. The more nuances the difference between a relevant and irrelevant patent, the more time it will take to teach the algorithm how to recognize it.
On the back-end, features of the search algorithm are up and running and we are now working on an interface to open it up to you guys. We estimate this will be done in Q4 of 2020. Feel free to reach out for more information or with questions about our search algorithm.