RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
A semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories
Waldrop, A., Cheadle, J., Bradford, K. C., Preiss, S., Chew, R., Holt, J., Kebede, Y., Braswell, N., Watson, M., Hench, G., Crerar, A., Ball, C., Schreep, C., Linebaugh, PJ., Hiles, H., Boyles, R. R., Bizon, C., Krishnamurthy, A., & Cox, S. (2022). Dug: A semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories. Bioinformatics, 38(12), 3252-3258. https://doi.org/10.1093/bioinformatics/btac284
Motivation: As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned.Results: Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results.