Review article

A practical guide for combining data to model species distributions

Robert J. FletcherDepartment of Wildlife Ecology and Conservation University of Florida P.O. Box 110430, 110 Newins‐Ziegler Hall Gainesville Florida 32611‐0430 USATrevor J. HefleyDepartment of Statistics Kansas State University 205 Dickens Hall Manhattan Kansas 66506‐0802 USAEllen P. RobertsonDepartment of Wildlife Ecology and Conservation University of Florida P.O. Box 110430, 110 Newins‐Ziegler Hall Gainesville Florida 32611‐0430 USABenjamin ZuckerbergDepartment of Forest and Wildlife Ecology University of Wisconsin 226 Russell Labs, 1630 Linden Drive Madison Wisconsin 53706‐1598 USARobert A. McCleeryDepartment of Wildlife Ecology and Conservation University of Florida P.O. Box 110430, 110 Newins‐Ziegler Hall Gainesville Florida 32611‐0430 USARobert M. DorazioDepartment of Biology San Francisco State University 1600 Holloway Avenue San Francisco California 94132 USA

2019en

ABI

Abstract

Understanding and accurately modeling species distributions lies at the heart of many problems in ecology, evolution, and conservation. Multiple sources of data are increasingly available for modeling species distributions, such as data from citizen science programs, atlases, museums, and planned surveys. Yet reliably combining data sources can be challenging because data sources can vary considerably in their design, gradients covered, and potential sampling biases. We review, synthesize, and illustrate recent developments in combining multiple sources of data for species distribution modeling. We identify five ways in which multiple sources of data are typically combined for modeling species distributions. These approaches vary in their ability to accommodate sampling design, bias, and uncertainty when quantifying environmental relationships in species distribution models. Many of the challenges for combining data are solved through the prudent use of integrated species distribution models: models that simultaneously combine different data sources on species locations to quantify environmental relationships for explaining species distribution. We illustrate these approaches using planned survey data on 24 species of birds coupled with opportunistically collected eBird data in the southeastern United States. This example illustrates some of the benefits of data integration, such as increased precision in environmental relationships, greater predictive accuracy, and accounting for sample bias. Yet it also illustrates challenges of combining data sources with vastly different sampling methodologies and amounts of data. We provide one solution to this challenge through the use of weighted joint likelihoods. Weighted joint likelihoods provide a means to emphasize data sources based on different criteria (e.g., sample size), and we find that weighting improves predictions for all species considered. We conclude by providing practical guidance on combining multiple sources of data for modeling species distributions.

Identifiers

DOI: 10.1002/ecy.2710

Citations and references

Cited by 40 references

Metrics — AkademScholar