RAIN NLP BENCHMARK
With the increasing availability of textual data and improved model capabilities, Natural Language Processing (NLP) is gaining wider adoption in industry. However, the field is mainly guided by research-motivated benchmarks which by their nature can fail to adequately measure real-world utility of NLP applications.
At Mishcon de Reya we believe that, in addition to existing benchmarks, application-focused benchmarks will help guide us towards vastly improved Natural Language Understanding (NLU) for industrial applications.
To facilitate greater focus on NLP in industrial settings, we introduce the Real-World Applied Industrial NLP (RAIN) benchmark – a collection of NLP tasks and corresponding datasets with broad practical application.
We formulate several NLP language tasks, and make available new datasets for each, totalling over 150,000 annotations, and provide evaluation of baseline and task-specific models. At the time of release (Q4 2021), we observe a headroom gap to human performance on the overall score of 19.4%.
We will maintain the benchmark and update datasets and include new tasks on an ongoing basis.