English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19

Forskningsoutput: Working paper/PreprintPreprint (i preprint-arkiv)

Sammanfattning

Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.
Originalspråkengelska
UtgivarearXiv.org
StatusPublished - 2020 mars 22

Ämnesklassifikation (UKÄ)

  • Språkteknologi (språkvetenskaplig databehandling)
  • Övrig annan medicin och hälsovetenskap
  • Mikrobiologi inom det medicinska området

Fingeravtryck

Utforska forskningsämnen för ”English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19”. Tillsammans bildar de ett unikt fingeravtryck.

Citera det här