English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19

Research output: Other contributionMiscellaneous

Standard

Harvard

APA

CBE

MLA

Vancouver

Author

RIS

TY - GEN

T1 - English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19

AU - Kazemi Rashed, Salma

AU - Frid, Johan

AU - Lim, Jong Chan

AU - Aits, Sonja

PY - 2020/3/22

Y1 - 2020/3/22

N2 - Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.

AB - Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.

KW - SARS-CoV-2

KW - COVID-19

KW - Text mining

KW - BioNLP

KW - Artificial Intelligence

M3 - Miscellaneous

T3 - ARXIV

ER -