RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Robust genome-wide ancestry inference for heterogeneous datasets
Illustrated using the 1,000 genome project with 3D facial images
Li, J., Zarzar, T. G., White, J. D., Indencleef, K., Hoskens, H., Matthews, H., Nauwelaers, N., Zaidi, A., Eller, R. J., Herrick, N., Günther, T., Svensson, E. M., Jakobsson, M., Walsh, S., Van Steen, K., Shriver, M. D., & Claes, P. (2020). Robust genome-wide ancestry inference for heterogeneous datasets: Illustrated using the 1,000 genome project with 3D facial images. Scientific Reports, 10(1), 11850. https://doi.org/10.1038/s41598-020-68259-w
Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case-control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively.