Many research projects struggle to analyze large volumes of audio data efficiently. As a result, valuable insights remain untapped, leading to missed opportunities for novel data analysis, effective quality control, and searches of recorded content. RTI QUINTET™ offers a suite of AI tools for data analysis, transcription, search, and data quality checks, transforming previously inaccessible audio data into useful information. By using RTI QUINTET, teams can unlock novel analysis capabilities, reduce costs, and improve the quality of their data.
Making Audio AI Data Analysis Easy and Modular
RTI QUINTET consists of several components that can be used independently or collectively to address common needs when working with recorded data. For transcription, RTI QUINTET uses OpenAI’s Whisper model, which rivals the accuracy of human transcription but is up to 80 times faster, meaning that a transcript for a 30-minute recording can be generated in minutes instead of hours or days. An optional audio preprocessing module enhances transcription accuracy for dynamic, real-world recordings by adjusting audio clarity and filtering out background noises like wind and traffic. This ensures clear transcription even when individuals speak at varying volumes or change locations.
RTI QUINTET also has diarization capabilities, which identify the number of speakers in a recording and the specific times each speaker is talking to get a speaker-specific transcript. The information can also be used for analysis or quality control purposes, such as verifying the number of speakers or performing natural language processing on speaker-specific language.
Transcripts are loaded into a vector database, enabling semantic searches across transcripts. For quality control use cases, this means a user can search for known questions from an interview questionnaire in the transcript and get relevant transcript snippets, while accounting for variations in speaker phrasing or word use and transcription variability. Users can also search for common themes, topics, or phrases across transcripts to facilitate qualitative research.
Case Study 1: Maintaining Accuracy in Interviewer-Administered Surveys and Computer-Assisted Recorded Interviewing
Interviewer-administered surveys can suffer from quality issues when question wording differs in interviews and the standardized questionnaire. Survey recordings are typically reviewed manually to identify discrepancies and ensure survey quality; however, quality assurance is a time-consuming manual process, requiring a reviewer to listen to hours of recorded surveys to identify discrepancies between interviewer question wording and the survey’s standardized questionnaire. It also is cost prohibitive to review all surveys given the vast number of hours of recorded interviews generated by large-scale surveys. Most often, only a random sample of administered surveys receives manual quality review. Random sampling leads to inefficiencies, such as quality assurance staff listening to surveys that do not have quality issues or missed opportunities for quality improvement if interviews with quality issues are not included in the random sample.
RTI QUINTET increases efficiency by enabling quality assurance teams to prioritize potentially problematic survey recordings using automated transcription and semantic search. As an example, our team used RTI QUINTET on a health care survey to identify instances where the consent statement was not read verbatim, resulting in a need for further review. RTI QUINTET quickly identified 17 instances where wording errors were made by interviewers and one instance where the consent statement was not read at all, indicating that the interviewers may benefit from retraining. The automated identification of these surveys enables quality assurance teams to focus on interviews that are most likely to need follow-up rather than a random sample that may only include interviews that are administered as expected.
RTI QUINTET can also detect the number of speakers in a recording to combat survey falsification. For each recording, we apply voice activity detection (determining whether parts of the recording contain a human voice), calculate embeddings, use multiple algorithms to cluster the embeddings (assuming that two unique voices are present), and then calculate whether a two-voice assumption is supported. This feature can help identify the number of voices in a recording quicker than manual review.
Case Study 2: Using Body-Worn Camera Footage to Improve Criminal Justice Practices
Law enforcement agencies are increasingly adopting body-worn cameras (BWCs) to enhance transparency, accountability, and officer training. With the high volume of footage generated, it can be challenging to review and analyze interactions systematically to ensure adherence to training protocols and best practices. In addition, real-world data collected through BWCs often include varied recording conditions and background noise that make transcription challenging. Without an efficient method to analyze BWC footage, agencies may miss opportunities to provide constructive feedback, identify exemplary interactions, and ensure that officers are consistently applying effective techniques.
With RTI QUINTET, law enforcement agencies can transcribe and analyze BWC footage to focus on interactions between officers and clients or the public to collect and provide feedback that highlights positive examples of effective interactions. RTI QUINTET preprocessing techniques can also increase transcription accuracy by 30% and match human transcription performance. Implementing RTI QUINTET enables more effective supervision and consistent and comprehensive feedback, giving officers actionable insights to improve continuously and contribute to more positive outcomes within the criminal justice system.
Streamline Audio Data Analysis for Your Research
Interested in unlocking novel analysis capabilities, reducing costs, and enhancing data quality with RTI QUINTET? Contact us today to learn how our AI tools for data analysis can transform your audio data into insights.