Assessing the accuracy of machine-assisted abstract screening with DistillerAI: A user study

Gerald Gartlehner; Gernot Wagner; Linda Lux; Lisa Affengruber; Andreea Dobrescu; Angela Kaminski-Hartenthaler; Meera Viswanathan

Assessing the accuracy of machine-assisted abstract screening with DistillerAI

A user study

Gartlehner, G., Wagner, G., Lux, L., Affengruber, L., Dobrescu, A., Kaminski-Hartenthaler, A., & Viswanathan, M. (2019). Assessing the accuracy of machine-assisted abstract screening with DistillerAI: A user study. Systematic Reviews, 8(1), Article 277. https://doi.org/10.1186/s13643-019-1221-3

Copy citation

Abstract

BACKGROUND: Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool.

METHODS: We evaluated the accuracy of the approach using DistillerAI as a semi-automated screening tool. A published comparative effectiveness review served as the reference standard. Five teams of professional systematic reviewers screened the same 2472 abstracts in parallel. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For all remaining abstracts, DistillerAI replaced one human screener and provided predictions about the relevance of records. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening, and screening with DistillerAI alone against the reference standard.

RESULTS: The combined sensitivity of the machine-assisted screening approach across the five screening teams was 78% (95% confidence interval [CI], 66 to 90%), and the combined specificity was 95% (95% CI, 92 to 97%). By comparison, the sensitivity of single-reviewer screening was similar (78%; 95% CI, 66 to 89%); however, the sensitivity of DistillerAI alone was substantially worse (14%; 95% CI, 0 to 31%) than that of the machine-assisted screening approach. Specificities for single-reviewer screening and DistillerAI were 94% (95% CI, 91 to 97%) and 98% (95% CI, 97 to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86%).

CONCLUSIONS: The accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening for systematic reviews. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Meera Viswanathan

Meera Viswanathan

Navigate to Gerald Gartlehner

Gerald Gartlehner

Recent Publications

Article

Patient-reported outcome improvements following scalp hair regrowth among patients with Alopecia Areata: analysis of the ALLEGRO-2b/3 trial

December 2025

Article

Plain language summary of mortality rates of patients with Parkinson’s disease psychosis who were treated either with pimavanserin or with different second-generation (atypical) antipsychotics

December 2025

Article

Biological parenthood rates among men with sickle cell disease

December 2025

Article

Patterns of felt stigma among rural-dwelling people who use drugs: A latent class analysis

December 2025

Article

One voice and vision: How the RISE network built a collective identity as the foundation for strategic dissemination

December 2025

Article

Estimating community-level prevalence of opioid use disorder: Extrapolating from Medicaid claims data and other publicly available data sources in Ohio, USA

December 2025

Article

Experiences of parents who receive a false-positive CK-MM screening for their newborn

December 2025

Article

Evaluating the efficacy and safety of milrinone for prevention of post-patent ductus arteriosus closure syndrome (the MIDAS trial) in extremely preterm infants: A multicentre, double-masked, randomised, placebo-controlled trial

December 2025

View All Publications