Events
2020 Joint Statistical Meetings | Leveraging Statistical Innovations to Use Data for the Public Good
Date
August 2, 2020 12:00am - August 6, 2020 11:59pm
Join RTI at the 2020 virtual Joint Statistical Meetings (JSM) to hear how we are turning knowledge into practice and leveraging data for the public good. Hosted by the American Statistical Association (ASA), JSM is one of the largest statistical events in the world, drawing more than 6,500 attendees and including more than 600 sessions.
With experience conducting complex statistical analyses and ensuring the quality, validity, and reliability of data from surveys and research studies across laboratory and social sciences, our statisticians are skilled in managing changing variables to turn knowledge into practice.
Though this year's JSM conference will be held as a virtual event, we are still looking forward to engaging with you and showing how we use data for the public good.
At JSM 2020, our experts will participate in 14 different sessions, serving as chairs and discussants, sharing their research on topics ranging from machine learning to nonignorable nonresponse.
Featured Sessions
Engage with Us at the Following Sessions
August 3, 2020
Statistical Measurements of Social Issues and Trends
10:00AM - 2:00PM
Chair: Antje Kirchner, PhD
Methodological Developments and Implications for Social Scientists
10:00AM - 2:00PM
Chair: Karol Krotki, PhD
Technology Impact on Total Survey Error
10:00AM - 2:00PM
Chair: Antje Kirchner, PhD
Statistical Computing: Data Science
10:00AM - 2:00PM
Paper: Clinical Trials Data Sharing: Streamlined Process for Deidentification
Presenters: Amaanti Sridhar, RTI International
Additional Author: Marie Gantz, RTI International
Summary: NIH requires timely sharing of scientific data resulting from all NIH-funded or conducted research. Preparing data for sharing is an iterative process of assessing and mitigating identification risks. For the shared database to be user-friendly and proficiently utilized, detailed documentation on the deidentification process is crucial. We will discuss a successfully implemented strategy that streamlined this process by (a) developing an efficient method to maintain documentation, and (b) creating reusable code (SAS macros) for determining potentially identifiable data, and performing and documenting deidentification.
Technology Impact on Total Survey Error
10:05AM - 10:25AM
Paper: Detecting Housing Units from Satellite Imagery Using Computer Vision
Presenter: Stephanie Eckman, PhD, RTI International
Additional Authors: Qiang Qiu, Perdue University, and Tien-Yu Liu, Duke University
Summary: High quality surveys require a list of housing units from which to select a sample. Each year, several US studies send field staff out to create these lists. Not only is this process expensive, inefficient and redundant, it also tends to miss some housing units, particularly in rural areas. This new approach trained an algorithm to detect housing units in several counties in North Carolina, and it uses computer vision techniques to detect dwellings in satellite images. The team is working on expanding the approach to detect demographics of households in rural areas to assist surveys that want to oversample some demographic groups.
August 4, 2020
Contemporary Machine Learning
10:00AM - 2:00PM
Paper: Transfer Learning for Auto-Coding Free-Text Survey Responses
Presenter: Peter Baumgartner, RTI International
Additional Authors: Amanda Smith, RTI International, Murrey Olmsted, RTI International, Dawn Ohse, RTI International, and Bucky Fairfax, RTI International
Summary: Coding responses from free-text, open-ended survey questions (i.e., qualitative coding) can be a labor-intensive process. Traditional machine learning (ML) methods for text classification require large amounts of training data usually not available from conventional survey sample sizes. With that problem in mind, we evaluated a ML approach for auto-coding free-text survey responses, employing data augmentation and recent advances in transfer learning models for natural language processing.
Experimental Design
10:00AM - 2:00PM
Paper: The Future is Linked: Making Predictions with Data Sets Linked to Synthetic Populations
Presenter: Emily Hadley, RTI International
Additional Authors: Caroline Kery, RTI International, Georgiy Bobashev, RTI International, and Lauren Grattan, RTI International
Summary: We describe three methods for linking datasets with synthetic data: resampling, modeling predictors independently, and modeling predictors sequentially. We apply these methods to the prediction of the prevalence of Florida youth vaping by county and census tract using the 2018 Florida Youth Substance Abuse Survey (FYSAS) and synthetic records generated from the 5-Year American Community Survey (ACS). We discuss opportunities to apply this work in other fields, including restricted settings like health records.
Survey Research Methods
10:00AM - 2:00PM
Poster: Enhancing the Quality of Administrative Data for Statistical Use
Presenter: Dan Liao, RTI International
Additional Authors: Marcus Berzofsky, RTI International, and Alexia Cooper, Bureau of Justice Statistics
Summary: Recent decades have witnessed an explosion of administrative data due to digital transformation which creates new research opportunities. However, like other types of data, administrative data can carry various types of errors that limit its use. We will illustrate our experience in enhancing the quality of an administrative database collected from a national incident-based crime-reporting system in the US. This discussion will help one learn how to plan for their data quality enhancement work.
Business and Economic Statistics
10:00AM - 2:00PM
Poster: Automated Forecasting for Business Improvement
Presenter: Satkartar Kinney, RTI International
Summary: This presentation will describe an intra-institution collaboration between statisticians, data scientists, financial analysts, software developers, and data warehousing specialists to build an interactive online tool that uses historical data to provide up-to-date financial forecasts for a nonprofit research institution. The forecasts provide the institution with key measures of business strength that are crucial to informed short-term and long-range planning.
August 5, 2020
Making Sense of Network Data and Randomized Response
10:00AM - 2:00PM
Chair: Stephanie Eckman, PhD
How the Council of Professional Associations on Federal Statistics Advances Excellence in Federal Statistics
11:20AM - 11:45AM
Discussant: Steven Cohen, PhD
The State-of-the-Art Developments with Nonignorable Nonresponse
11:20AM - 11:45AM
Paper: Weighting for Item Imputation with Nonignorable Nonresponse
Presenter: Phillip Kott, PhD, RTI International
Summary: Imputation for item nonresponse in official surveys often take the form predicting the missing item value based on values of frame and nonmissing survey items for the same unit. This methodology depends on assuming two different models, the response and imputation model for the missing item. Moreover, one cannot test whether item-response is truly ignorable.
August 6, 2020
Privacy, Confidentiallity, and Disclosure Limitation
10:00AM - 2:00PM
Chair: Phillip Kott, PhD
125 Years of Representative Sampling: Important Contributions in the History of Survey Statistics
11:05AM - 11:25AM
Paper: Ralph Folsom's 1991 Statistical Proceedings Paper: A Seminal Contribution to Calibration Weighting of Nonresponse You Likely Never Heard Of
Presenter: Phillip Kott, RTI International
Additional Author: Jeremy Aldworth, RTI International
Summary: Few know that Folsom (1991) employed what would later be called calibration equations to adjust for unit nonresponse. He showed that using raking weights leads to nearly unbiased estimated totals when the log of the probability of a unit response is a linear function of the raking variables. This presentation will review his important contributions to calibration weighting, as well as point out a limitation of Deville and Särndal’s that Folsom’s work overcomes.
125 Years of Representative Sampling: Important Contributions in the History of Survey Statistics
11:05AM - 11:25AM
Poster: A Comparison of Outlier Detection Methods to Detect Irregular Counts in Administrative Data: A Case Study with NIBRS Data
Presenter: John Bunker, RTI International
Summary: Outlier detection for continuous variables in administrative data and establishment surveys can be challenging due to variation in the size of reporting units and time trends across them. Without proper detection and treatment for the outliers, statistics generated from suspect data may lead to inaccurate or spurious conclusions. In this presentation, we will compare several major outlier detection methods for detecting unusual crime counts reported by police agencies in the FBI’s National Incident-Based Reporting System (NIBRS) data.