WIPO Technology Trends report on artificial intelligence

WIPO Technology Trends report on artificial intelligence

SCIENCE-MINER is dedicated to the development of Open Source tools for Technical and Scientific Information, with a focus on Open Science. We develop and exploit state-of-the-art machine learning techniques, for instance with GROBID, entity-fishing and DeLFT, that can make possible new usages and approaches to scientific information in a reliable and scalable manner. As an opportunity to put machine learning and scientific information in a larger perspective, we participated recently to the WIPO Technology Trends report on artificial intelligence, together with CNRS Innovation, where we analyzed scholar and patent literature on AI back to the 1950s. We identified nearly 340,000 AI-related patent families, over 1.6 million relevant scientific publications, and also considered market trends related to AI (data on acquisitions, funding, open source and patent litigation and oppositions). The team analyzes this corpus against various criteria, such as time, geographical distribution, techniques, application fields, public/private sectors, etc. to better understand how research and development on AI have evolved over time and to measure its current booming – 50 percent of all AI patents have been published in just the last five years. To our knowledge, it is the first time that such a comprehensive and heterogeneous corpus was exploited to better analyze trends in Artificial Intelligence and we hope that this vast quantitative analysis will be helpful for future AI studies. We introduced 103 thematic AI clusters and sub-clusters and considered several hundred classes from the three main patent classification schemes (CPC, IPC and FI/F-Terms) combined with hundred of terms/synonyms to organize the identified publications at a meaningful level of granularity. A detailed background paper is available, describing the...
A reproducibility study on neural NER

A reproducibility study on neural NER

We have re-implemented in DeLFT (our Deep Learning Keras framework for text processing) the main neural architectures for Named Entity Recognition (NER) of the last two years in order to perform a reproducibility analysis. It appears that: although routinely compared in publications, most of the reported results are not directly comparable because they were obtained with different evaluation criteria, claims on architecture performance are usually not very well-founded, in fact, the difference in accuracy comes more significantly from different evaluation criteria and hyper-parameter tuning, ELMo contextual embeddings are a real breakthrough for NER, boosting performances by 2.0 points in f-score on the CoNLL 2003 NER corpus, but at the cost of a 25-times slower prediction time. Thanks to some optimisations and parameter tuning, most of our re-implementations outperform the original systems, in particular the recent best performing one (Peters and al., 2018) has been improved from f-score 92.22 to 92.47 with similar evaluation criteria on the CoNLL 2003 NER corpus.     DeLFT Our recently born DeLFT (Deep Learning Framework for Text) is a Keras framework for text processing, covering sequence labelling (e.g. named entity tagging) and text classification (e.g. comment classification). Our objective with this library is to re-implement standard state-of-the-art Deep Learning architectures for text processing having in mind the constraints of production environments, so efficiency, scalability, integration in JVM, etc. which are usually not considered in similar available Open Source projects based on Keras. Keras offers nice abstractions and independence from Deep Learning back-ends. As we will see in this study, the ease of implementation in Keras does not mean compromise in term of performance. On...