RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Deep learning detection of diabetic retinopathy in Scotland's diabetic eye screening programme
Fleming, A. D., Mellor, J., Mcgurnaghan, S. J., Blackbourn, L. A. K., Goatman, K. A., Styles, C., Storkey, A. J., Mckeigue, P. M., & Colhoun, H. M. (2024). Deep learning detection of diabetic retinopathy in Scotland's diabetic eye screening programme. British Journal of Ophthalmology, 108(7), 984-988. https://doi.org/10.1136/bjo-2023-323395
Background/Aims Support vector machine-based automated grading (known as iGradingM) has been shown to be safe, cost-effective and robust in the diabetic retinopathy (DR) screening (DES) programme in Scotland. It triages screening episodes as gradable with no DR versus manual grading required. The study aim was to develop a deep learning-based autograder using images and gradings from DES and to compare its performance with that of iGradingM.Methods Retinal images, quality assurance (QA) data and routine DR grades were obtained from national datasets in 179 944 patients for years 2006-2016. QA grades were available for 744 images. We developed a deep learning-based algorithm to detect whether either eye contained ungradable images or any DR. The sensitivity and specificity were evaluated against consensus QA grades and routine grades.Results Images used in QA which were ungradable or with DR were detected by deep learning with better specificity compared with manual graders (p<0.001) and with iGradingM (p<0.001) at the same sensitivities. Any DR according to the DES final grade was detected with 89.19% (270 392/303 154) sensitivity and 77.41% (500 945/647 158) specificity. Observable disease and referable disease were detected with sensitivities of 96.58% (16 613/17 201) and 98.48% (22 600/22 948), respectively. Overall, 43.84% of screening episodes would require manual grading.Conclusion A deep learning-based system for DR grading was evaluated in QA data and images from 11 years in 50% of people attending a national DR screening programme. The system could reduce the manual grading workload at the same sensitivity compared with the current automated grading system.