RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Temporal topic modeling of electronic health records from the N3C and RECOVER programs
O'Neil, S. T., Madlock-Brown, C., Wilkins, K. J., McGrath, B. M., Davis, H. E., Assaf, G. S., Wei, H., Zareie, P., French, E. T., Loomba, J., McMurry, J. A., Zhou, A., Chute, C. G., Moffitt, R. A., Pfaff, E. R., Yoo, Y. J., Leese, P., Chew, R. F., Lieberman, M., & Haendel, M. A. (2024). Finding long-COVID: Temporal topic modeling of electronic health records from the N3C and RECOVER programs. medRxiv : the preprint server for health sciences. https://doi.org/10.1101/2023.09.11.23295259
Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease.