Call for Participation: DrugProt Shared Task (BioCreative VII track 1- 2021)

Event start date
08-11-2021 09:00
Event end date
10-11-2021 17:00

DrugProt Shared Task (BioCreative VII track 1- 2021)

Text mining drug-protein/gene interactions (DrugProt) shared task


Previous BioCreative efforts have provided highly relevant resources for advancing biomedical text mining research, including datasets and system evaluations in both shared and interactive modes (e.g. BioBERT benchmark datasets, BioC format, ChemProt corpus, CHEMDNER corpus, etc.).

We are organizing the DrugProt track specifically focusing on the large-scale automatic extraction of relations between drugs or chemical compounds and genes/proteins of interest for drug discovery and biomedicine.

It is getting increasingly challenging to efficiently exploit drug-related information described in the growing amount of scientific literature. There are a range of different types of drug-gene/protein interactions, and their systematic extraction and characterization is essential to analyze, predict and explore key biomedical properties underlying high impact biomedical applications.

We foresee that the DrugProt track will promote the development NLP techniques to extract critical health information, generating results useful for:

  • Drug discovery, drug repurposing & drug design
  • Metabolic engineering, modeling drug response, pharmacogenetics
  • Drug-induced adverse reactions, off target interactions
  • Molecular medicine, systems biology and bioinformatics
  • Biomedical knowledge graph mining

Therefore the DrugProt organizers will release a large training corpus of manually annotated entity mentions for drugs/chemicals as well as genes/proteins together with their interactions (13 different types of interactions).

DrugProt teams participating will be provided with the following training corpus:

  • PubMed abstracts (3500)
  • Manually annotated chemical compound mentions (> 46200)
  • Manually annotated gene/protein mentions (> 43200)
  • Manually annotated drug/chemical-protein/gene interactions (> 17200)

Key information:

Evaluation will be done using micro-averaged f-measure by comparing the automatically extracted relations against previously manually labelled Gold Standard relations.

Important dates

  • Training set release (UPDATED!) - June 15th 2021: DrugProt training set
  • Test set release- July 5th 2021:  Test  set abstracts and entity annotations
  • Test set prediction submission due: August 30th 2021
  • Test set evaluation returned to participants: September 3st 2021
  • Workshop proceedings paper due: September 10th 2021
  • Paper acceptance and review returned: September 16th 2021
  • Test set Gold Standard annotations to participants:  September 27th 2021
  • BioCreative VII Workshop: November 8th-10th, 2021. (This workshop will be virtual).

BioCreative VII workshop proceedings and Journal Special Issue

Participating teams will be invited to contribute to the: Proceedings of the Seventh BioCreative Challenge Evaluation Workshop. Proceedings papers are free of charge. A selected number of top performing teams will also be invited to contribute with a longer system description paper to a special issue on BioCreative VII to be published in the journal Database.

Task organizers:

  • Martin Krallinger, Barcelona Supercomputing Center, Spain
  • Antonio Miranda, Barcelona Supercomputing Center, Spain
  • Farrokh Mehryary, University of Turku, Finland
  • Jouni Luoma, University of Turku, Finland
  • Sampo Pyysalo, University of Turku, Finland
  • Alfonso Valencia, Barcelona Supercomputing Center, Spain


[1] Krallinger, Martin, et al. "Overview of the protein-protein interaction annotation extraction task of BioCreative II." Genome biology 9.2 (2008): 1-19.
[2] Krallinger, Martin, et al. "CHEMDNER: The drugs and chemical names extraction challenge." Journal of cheminformatics 7.1 (2015): 1-11.
[3] Krallinger, Martin, et al. "Overview of the BioCreative VI chemical-protein interaction Track." Proceedings of the sixth BioCreative challenge evaluation workshop. Vol. 1. 2017.