Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Статья

Large-scale identification of undiagnosed hepatic steatosis using natural language processing

Carolin V. SchneiderDepartment of Medicine III, RWTH Aachen University, Aachen, GermanyTang LiDepartment of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADavid ZhangDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAAnya MezinaDivision of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAPuru RattanDivision of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAHelen HuangDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAKate Townsend CreasyDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAEleonora ScorlettiDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAInuk ZandvakiliDivision of Digestive Diseases, Department of Internal Medicine, College of Medicine, University of Cincinnati, Cincinnati, OH 45267, USAMarijana VujkovićCorporal Michael J. Crescenz VA Medical Center, Philadelphia, PA 19104, USALeonida HehlDepartment of Medicine III, RWTH Aachen University, Aachen, GermanyJacob FikselDepartment of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAJoseph ParkDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAKirk J. WangensteenDepartment of Medicine, Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN 55902, USAMarjorie RismanDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USAKyong‐Mi ChangCorporal Michael J. Crescenz VA Medical Center, Philadelphia, PA 19104, USAMarina SerperCorporal Michael J. Crescenz VA Medical Center, Philadelphia, PA 19104, USARotonya M. CarrDepartment of Medicine, Division of Gastroenterology, University of Washington, Seattle, WA 98195, USAKai Markus SchneiderDepartment of Medicine III, RWTH Aachen University, Aachen, GermanyJinbo ChenDepartment of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADaniel J. RaderDivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
2023en
ABI

Аннотация

BackgroundNonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports.MethodsA rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%).FindingsAfter exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis.InterpretationThis study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes.FundingThe study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process.

Перевод пока недоступен

Идентификаторы

Цитирования и источники

Цитирований: 2Использованных источников: 0