English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19

Research output: Other contributionMiscellaneous

Abstract

Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.

Details

Authors
Organisations
Research areas and keywords

Subject classification (UKÄ) – MANDATORY

  • Other Medical Sciences not elsewhere specified

Keywords

  • SARS-CoV-2, COVID-19, Text mining, BioNLP, Artificial Intelligence
Original languageEnglish
Publication statusPublished - 2020 Mar 22
Publication categoryResearch

Publication series

NameARXIV
PublisherCornell University Library

Related projects

Sonja Aits, Johan Frid, Pierre Nugues, Salma Kazemi Rashed, Jong Chan Lim & Marcus Klang

2020/01/20 → …

Project: Research

View all (2)