Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort

Melissa Lynn McPheeters; Sarah DeLozier; Sarah Bland; Melissa McPheeters; Quinn Wells; Eric Farber-Eger; Cosmin A. Bejan; Daniel Fabbri; Trent Rosenbloom; Dan Roden; Kevin B Johnson; Wei-Qi Wei; Josh Peterson; Lisa Bastarache

Phenotyping coronavirus disease 2019 during a global health pandemic

Lessons learned from the characterization of an early cohort

DeLozier, S., Bland, S., McPheeters, M., Wells, Q., Farber-Eger, E., Bejan, C. A., Fabbri, D., Rosenbloom, T., Roden, D., Johnson, K. B., Wei, W.-Q., Peterson, J., & Bastarache, L. (2021). Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort. Journal of Biomedical Informatics, 117, Article 103777. https://doi.org/10.1016/j.jbi.2021.103777

Copy citation

Abstract

From the start of the coronavirus disease 2019 (COVID-19) pandemic, researchers have looked to electronic health record (EHR) data as a way to study possible risk factors and outcomes. To ensure the validity and accuracy of research using these data, investigators need to be confident that the phenotypes they construct are reliable and accurate, reflecting the healthcare settings from which they are ascertained. We developed a COVID19 registry at a single academic medical center and used data from March 1 to June 5, 2020 to assess differences in population-level characteristics in pandemic and non-pandemic years respectively. Median EHR length, previously shown to impact phenotype performance in type 2 diabetes, was significantly shorter in the SARS-CoV-2 positive group relative to a 2019 influenza tested group (median 3.1 years vs 8.7; Wilcoxon rank sum P = 1.3e52). Using three phenotyping methods of increasing complexity (billing codes alone and domain-specific algorithms provided by an EHR vendor and clinical experts), common medical comorbidities were abstracted from COVID-19 EHRs, defined by the presence of a positive laboratory test (positive predictive value 100%, recall 93%). After combining performance data across phenotyping methods, we observed significantly lower false negative rates for those records billed for a comprehensive care visit (p = 4e-11) and those with complete demographics data recorded (p = 7e-5). In an early COVID-19 cohort, we found that phenotyping performance of nine common comorbidities was influenced by median EHR length, consistent with previous studies, as well as by data density, which can be measured using portable metrics including CPT codes. Here we present those challenges and potential solutions to creating deeply phenotyped, acute COVID-19 cohorts.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Melissa McPheeters

Melissa McPheeters

Recent Publications

Article

Plain language summary of mortality rates of patients with Parkinson’s disease psychosis who were treated either with pimavanserin or with different second-generation (atypical) antipsychotics

December 31, 2025

Article

US consumer and healthcare professional preferences for combination COVID-19 and influenza vaccines

December 31, 2025

Article

Intestinal E. coli-produced yersiniabactin promotes profibrotic macrophages in Crohn's disease

December 08, 2025

Article

Advances in analytical methodologies for detecting novel psychoactive substances

April 01, 2025

View All Publications