RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Machine learning for medical coding in healthcare surveys
Hadley, E. C., Chew, R., Nance, J. M., Baumgartner, P. M., Thissen, M. R., Plotner, D. M., Carr, C. M., & National Center for Health Statistics (U.S.) (2021). Machine learning for medical coding in healthcare surveys. Vital and health statistics. Series 2, Data evaluation and methods research, 2021(189), 1-22. https://doi.org/10.15620/cdc:109828
Objective Medical coding, or the translation of healthcare information into numeric codes, is expensive and time intensive. This exploratory study evaluates the use of machine learning classifiers to perform automated medical coding for large statistical healthcare surveys. Methods This research used medically coded data from the Emergency Department portion of the 2016 and 2017 National Hospital Ambulatory Medical Care Survey (NHAMCS-ED). Natural language processing classifiers were developed to assign medical codes using verbatim text from patient visits as inputs. Medical codes assigned included three-digit truncated 10th Revision of the International Statistical Classification of Diseases and Related Health Problems, Clinical Modification (ICD-10-CM) codes for diagnoses (DIAG) and cause of injury (CAUSE), as well as the full length NCHS reason for visit (RFV) classification codes. Results The best-performing model of the multiple machine learning models assessed was a multilabel logistic regression. The Jaccard coefficient was used for measuring the degree of agreement between a model and a human versus two humans on the same set of codes. The human-to-human agreement consistently outperformed the model-to-human agreement, though both performed best on diagnosis (human-to human: 0.88, model-to-human: 0.78) and worst on injury codes (human: 0.50, model: 0.28). The model outperformed the human coders on 7.7% of the unique codes assigned by both the model and a human, with strong performance on specific truncated ICD–10–CM diagnosis codes. Conclusion This case study demonstrates the potential of machine learning for medical coding in the context of large statistical healthcare surveys. While trained medical coders outperformed the assessed models across the medical coding tasks of assigning correct diagnosis, injury, and RFV codes, machine learning models showed promise in assisting with medical coding projects, particularly if used as an adjunct to human coding.