DeLFT (Deep Learning Framework for Text) is an Open Source Keras framework for text processing, covering sequence labeling (e.g. named entity tagging) and text classification (e.g. comment classification). This library re-implements standard state-of-the-art Deep Learning architectures and experiments them more particularly for the following tasks:
- Named Entity Recognition: DeLFT included state-of-the-art models for English (92.47 f-score on CoNLL 2003, versus 92.22 reported by Peters and al. 2018, and 87.01 f-score with Ontonotes 5) and for French, see our blog entry on this.
- Deep Learning versions of GROBID models (citation, dates, header, affiliation-address, etc.)
- Scholar citation context classification (positive, neutral, negative citations) with state-of-the-art performance (92.59 f-score versus 89.8 reported for the previous best performing SVM implementation)
- Insult recognition and toxic comment classification for supporting automated moderation
The library is oriented toward production, with very compact models (less than 2MB, all models are included in the distribution), efficient management of embeddings (no load time, not loaded in memory, no runtime impact) and the possibility to stream training data for large scale datasets that cannot fit in memory.