Статья

Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study

Mayara KhadhraouiNational Engineering School of Sfax (ENIS), University of Sfax, Sfax 3038, TunisiaHatem BellâajReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, TunisiaMehdi Ben AmmarFaculty of Engineering, Université de Moncton, Moncton, NB E1A 3E9, CanadaHabib HamamDepartment of Electrical and Electronic Engineering Science, School of Electrical Engineering, University of Johannesburg, Johannesburg 2006, South AfricaMohamed JmaïelReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, Tunisia

2022en

ABI

Аннотация

On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT.

Перевод пока недоступен

Идентификаторы

DOI: 10.3390/app12062891

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar