Training Corpus Released for BARR2 track - IberEval 2018

Second Biomedical Abbreviation Recognition and Resolution track (BARR2)

BARR2 track workshop at SEPLN2018, September 18, Seville, Spain

http://temu.bsc.es/BARR2/

Overview

Finding and resolving abbreviations and symbols is a critical task not only for search engines/information retrieval, text classification, named entity recognition or even machine translation systems.

Moreover, approaches to recognize and resolve abbreviations can often be directly adapted across different languages, resulting in resources of both widespread use and high impact. Nevertheless, due to the lack of exhaustively manually annotated abbreviation resolution corpora, in particular for certain key domains, evaluating and improving abbreviation resolution systems is still an active field of research.

In case of biomedical clinical texts, abbreviations are particularly frequent, often referring to entities and concepts of importance such as diseases, treatments, symptoms, drugs or biomolecular entities. Clinical NLP systems require correct abbreviation recognition and resolution.

Building on the success of the first BARR track posed at IberEval 2017 which covered the detection of short form – long form relations from the medical literature, the second BARR track requires detection of abbreviations from clinical case reports written in Spanish. With an estimated number of over 500 million Spanish speakers worldwide, recognition and resolution of abbreviations in Spanish clinical texts is an important task for Spanish clinical NLP tools.

The BARR2 track will be structured into two sub-tasks, namely:

  • Sub-track 1: asking participating teams to provide systems able to detect only explicit occurrences of abbreviation-definition pairs
  • Sub-track 2: provide resolution of short forms regardless whether its definitions is mentioned within the actual document

The primary evaluation metric used for the BARR2 track will consist in precision, recall, y f-score of the predictions against manual gold standard. Additional details, sample sets, FAQ and inscription details can be found at: BARR2 track URL: http://temu.bsc.es/BARR2

Tentative track dates
20th April 2018   Release of sample data (sub-track 1 and sub-track 2)
17th May 2018   Training corpus available (sub-track 1 and sub-track 2)
23rd May 2018   Development corpus available (sub-track 1 and sub-track 2)
25th May 2018   Test corpus available (sub-track 1 and sub-track 2)
10th June 2018 (latest 23:55 CET)   Submission of team predictions (sub-track 1 and sub-track 2)
13th June 2018   Publication of results (sub-track 1 and sub-track 2)
20th June 2018   Working notes submission
23rd June 2018   Release of the working notes reviews
2nd July 2018   Camera ready paper submission
18th September 2018   IberEval 2018 Workshop
Organizers
  • Martin Krallinger, Biological Text Mining Unit (Bio-TeMUC), CNIO, Spain
  • Alfonso Valencia, Life Sciences Department Director, BSC, Spain
  • Nuria Bel, Department of Translation and Language Sciences, UPF, Spain
  • Ander Intxaurrondo, Biological Text Mining Unit (Bio-TeMUC), CNIO, Spain
  • Aitor Gonzalez-Aguirre, Biological Text Mining Unit (Bio-TeMUC), CNIO, Spain
  • Marta Villegas, Barcelona Supercomputing Center (BIO-TeMUC), CNIO, Spain
  • Jose A. Lopez Martin, Medical Oncology, Hospital 12 de Octubre, Spain
  • Montserrat Marimon, Barcelona Supercomputing Center (Bio-TeMUC), BSC, Spain
Scientific Advisory Board
  • Saber Ahmad Akhondi, Principle NLP Scientist, Elsevier Content & Innovation Sophia Ananiadou, Professor of the School of Computer Science, University of Manchester / Director of the National Centre for Text Mining (NaCTeM), UK
  • Marius Doornenbal, Chief NLP Scientist, Elsevier Content & Innovation
  • Fernando A. Navarro, MD, Cosnauta, Siglas médicas en español; Founding member of TREMÉDICA, Spain
  • Carlos Luis Parra Calderón, Head of Technological Innovation Section, University Hospital Virgen del Rocio; Director of biomedical informatics, Instituto de Biomedicina de Sevilla, Spain
  • Hua Xu, Director of the Center for Computational Biomedicine, University of Texas Health Science Center at Houston, USA