The Benefits of Using Generative AI in Genomics Research
In research, genomics are leveraged to gain biological insights, offering important information that can improve the health and well-being of all people. During this process, researchers are challenged repeatedly with translating large-scale genomics data efficiently. Manually interpreting key genes for diseases or phenotypes is time-consuming, particularly when dealing with hundreds or thousands of genes. In an era when researchers are expected to find answers to some of our toughest public health challenges, it is critical to find ways to work in a timely manner, while maintaining scientific rigor and fostering discovery.
To address the need for an efficient and reliable approach to genomic data interpretation, RTI International developed Context GenI AI. This innovative tool—powered by artificial intelligence (AI)—searches the literature to provide a structured summary of input genes, known associations, and potential novel connections related to a specific disease or phenotype. Context GenI AI aims to accelerate project timelines, reduce costs, and enhance research output, enabling us to support our clients and collaborators better through current and future work.
Using LLMs for Genomics Research Efficiency
In recent years, significant progress has been made in AI development, and AI tools are being implemented frequently across the research landscape. Large language models (LLMs) have become increasingly popular, helping in a variety of use cases that include converting texts into simpler and more accessible language, streamlining analyses, improving the accuracy of data extraction, and more. RTI experts are finding ways to capitalize on AI tools, while ensuring that they are used responsibly and with careful consideration.
Over the past year, our multidisciplinary team—with backgrounds in genetics, data science, and programming—has been working on Context GenI AI to help researchers interpret their results from genomic data analyses. The tool leverages PubMed and the power of LLMs to perform a literature review on genes of interest with a given phenotype or disease background. This saves researchers hours to weeks by summarizing a gene’s functions in a given context, while also clustering genes with similar functions to help structure their interpretations. In addition, for each response, Context GenI AI can highlight the relevant paragraph(s) in each article that support its summary so users can easily click and read further. This tool can speed up the process from data interpretation to research publication.
How Does Context GenI AI Work in Genomics Research?
After spending months perfecting an analysis and creating a list of genes, researchers typically perform enrichment analyses and review literature that is most relevant to the research question. However, enrichment analyses are often too general and not context specific, and literature reviews can be time-consuming and labor-intensive. RTI created Context Genl AI to address these challenges and provide an alternative for researchers.
In our application, users first enter the gene identifiers and phenotypes they are most interested in studying. The query structure provides flexibility in matching articles that discuss the gene in relation to any or all of the specified phenotypes. Once constructed and submitted, the query is sent to PubMed through its application programming interface (API), retrieving a list of relevant articles. The BioC API is then used to retrieve full-text articles using the identifiers returned from PubMed. To limit the number of articles used for retrieval, the application provides a slider for users to select the number of PubMed articles to download and insert for retrieval-augmented generation (RAG).
The articles that are successfully parsed are sent to the RAG pipeline for preprocessing, retrieval, and answer generation. In our trials, runtimes varied depending on the available hardware, but this process typically took a few seconds to a few minutes to generate an answer. Answers from the LLM—along with the relevant article chunks obtained from the retrieval phase—are displayed to the user in expandable information cards. If multiple genes are included in the search, results will stream in as they are generated. Users can then expand the result cards to view the relevant text chunks and the PubMed search query and access a table that can be downloaded as a CSV.
The Future of Context GenI AI to Streamline Genomics Research
Our team saw promising results through the creation and initial testing of Context GenI AI, and we see the potential for further enhancements to the tool. We are currently deploying Context GenI AI in the RTI Merge™ environment so that we can expand our user base and solicit user feedback to improve their experience. Broader use will also enable us to test how well the tool performs across a wide variety of genes—including phenotype pairs—and refine the RAG pipeline as needed to improve the consistency of the tool’s results. In addition, we envision testing with a more powerful cloud-based LLM generation—such as ChatGPT— to determine whether this will improve article summary relevance.
These improvements—along with user feedback and input—can help us enhance Context GenI AI for widespread use and implementation across projects, offering a better approach to genomic data interpretation.
Learn more about RTI’s data science services, AI capabilities, and genomics research.