A reproducibility study on neural NER

A reproducibility study on neural NER

We have re-implemented in DeLFT (our Deep Learning Keras framework for text processing) the main neural architectures for Named Entity Recognition (NER) of the last two years in order to perform a reproducibility analysis. It appears that: although routinely compared in publications, most of the reported results are not directly comparable because they were obtained with different evaluation criteria, claims on architecture performance are usually not very well-founded, in fact, the difference in accuracy comes more significantly from different evaluation criteria and hyper-parameter tuning, ELMo contextual embeddings are a real breakthrough for NER, boosting performances by 2.0 points in f-score on the CoNLL 2003 NER corpus, but at the cost of a 25-times slower prediction time. Thanks to some optimisations and parameter tuning, most of our re-implementations outperform the original systems, in particular the recent best performing one (Peters and al., 2018) has been improved from f-score 92.22 to 92.47 with similar evaluation criteria on the CoNLL 2003 NER corpus.     DeLFT Our recently born DeLFT (Deep Learning Framework for Text) is a Keras framework for text processing, covering sequence labelling (e.g. named entity tagging) and text classification (e.g. comment classification). Our objective with this library is to re-implement standard state-of-the-art Deep Learning architectures for text processing having in mind the constraints of production environments, so efficiency, scalability, integration in JVM, etc. which are usually not considered in similar available Open Source projects based on Keras. Keras offers nice abstractions and independence from Deep Learning back-ends. As we will see in this study, the ease of implementation in Keras does not mean compromise in term of performance. On...