In commercial research and development projects, public disclosure of new chemical compounds and reactions often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Patent authorities make available the patents but do not provide systematic continuous chemical annotations. Different text-mining approaches exist to extract chemical information from patents but less attention has been given to relevancy of a compound in a patent. Relevancy of a compound to a patent is based on the patent’s context. A relevant compound plays a major role within a patent. Identification of relevant compounds reduces the size of the extracted data and improves the usefulness of patent resources (e.g. supports identifying the main compounds). Annotators of databases like Reaxys only annotate relevant compounds.
Using the advanced technologies in Artificial intelligence (AI), Machine learning (ML) and Natural language processing (NLP), we have developed models to overcome these limitations. Through shared evaluation campaign we have also invited academic and industrial teams to further develop, improve and contribute to the domain of patent information extraction.
The webinar will discuss:
- The challenges of patent mining in the chemical domain
- Chemical information extraction. From relevant document to relevant section to relevant information.
- How to create a quality training set for machine learning in Chemistry
- The ChEMU shared task for name entity and event extraction
About speaker:
Saber Akhondi obtained his MSc degree in Bioinformatics and Systems Biology from Chalmers University of Technology, Sweden. In 2011 he started as a PhD student within the biosemantics group in Erasmus Medical Center Rotterdam. He currently works at Elsevier as a Principle NLP Scientist where he applies NLP and machine learning techniques to extract information useful for large commercial and research communities.