RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Generalized nonlinear models can solve the prediction problem for data from species-stratified use-availability designs
Johnson, N. G., Williams, M. R., Riordan, E. C., & Brito, J. (Ed.) (2021). Generalized nonlinear models can solve the prediction problem for data from species-stratified use-availability designs. Diversity and Distributions, 27(11), 2077-2092. Advance online publication. https://doi.org/10.1111/ddi.13384
Aim Habitat suitability modelling methods for presence-only species data are limited in their ability for making true predictions and are therefore often misused in ecological applications. A use-availability design-also known as a case-control design with contaminated controls-combines presence-only species data with a background sample of covariates where the species presence/absence is unknown. Assuming a log link function for the true probability of presence/absence, the use-availability data then can be analysed as a logistic regression model with a biased estimate of the intercept. Due to the biased intercept, the model is unable to make true predictions. Instead, ranking the "pseudo-predictions" from the model with biased intercept provides a viable alternative for making predictive inference in single-species models. We show the ranks are no longer conserved across species when such a single-species model is extended to multiple species, limiting predictive inference. Innovation By assuming a logit link function for the true probabilities of the presence/absence data, the resulting model allows for predictive inference even when extended to multiple species. We provide theoretical details justifying both fully Bayesian and large background sample asymptotic Bayesian generalized nonlinear model approaches. Main conclusions We illustrate how multiple species can be analysed using these approaches in R and Stan software using presence-only data for foundational shrubland taxa occurring in California, USA. Predictive inference highlights differences in habitat suitability rankings among individual species and among infraspecific taxa within a single species, improving the application of habitat suitability models for ecological restoration of southern California shrublands.