Atlas of Breeding Birds of the
Sierra de Guadarrama National Park

III ATLAS OF BIRDS
during the breeding season in Spain

General methodology

Analysis methodology

Distribution of birds in the Sierra de Guadarrama National Park 2024

Data

To develop the distribution models based on the field sampling carried out, information from 277 UTM 1×1 km grid cells was used. During the surveys, an effort was made to complete each grid cell with at least one transect of approximately 15 minutes or 500 m. In total, 286 transects were completed, with a total length of approximately 145 km.

The information collected in each transect on species abundance and habitats was used to perform an analysis of habitat selection for each species (see below). On the other hand, to create the distribution models (or probability of presence models), presence–absence data for the species at the scale of 1×1 km UTM cells was used, combining both direct fieldwork information and other sources that could confirm species presence at that scale. This allowed increasing the number of observations for each species by incorporating presence records downloaded from the eBird platform (eBird Basic Dataset, 2024). Specifically, data were downloaded for the years 2020 to 2024 for the April–June period across the entire Sierra de Guadarrama and processed with the R package auk (Strimas-Mackey et al., 2023).

For each 1×1 km UTM grid cell, all observations of each species were extracted and classified as detected or not detected. Additionally, models were generated only for those species detected in at least 5 UTM grid cells within the transects obtained from this atlas’ fieldwork, to ensure a minimum amount of data for model application.

Environmental variables

Different variables describing various aspects of the territory were used for the models, such as topography, land use, and climate. These variables were extracted from different data sources and converted to a 1 km scale:

Mean elevation from the Multi-Error-Removed Improved-Terrain Digital Elevation Model (MERIT-DEM, Yamazaki et al., 2017).
Enhanced Vegetation Index (EVI) derived from Landsat 8 satellite images courtesy of the U.S. Geological Survey, matching the sampling period as closely as possible, i.e., spring 2024.
Land cover and percentage of main habitat type classified by observers during field surveys and obtained for the 1×1 km UTM cells from the High-Resolution Land Occupation Information System of Spain (SIOSE-AR) for the year 2017.
Bioclimatic variables derived from monthly temperature and precipitation values.

Distribution models

In this study, distribution models based on presence–absence data were applied to identify the environmental conditions associated with the probability of a species’ presence in each habitat type. It is important to note that these models do not directly describe the spatial distribution of the number of individuals; although it might be intuitively assumed that a high probability of presence corresponds to high abundance, this is not necessarily true. Probability of presence simply indicates that conditions allow the species to occur, while abundance depends on additional processes such as resource availability, intra- and interspecific competition, and habitat quality.

To build robust abundance models that accurately indicate where more or fewer birds are located, repeated sampling over several years is required—something that could not be achieved in this project.

In ecology, it is common for both species and environmental variables to show patterns of spatial autocorrelation, meaning that geographically close sites tend to be more similar to each other than distant ones (Dormann et al., 2007). If this spatial autocorrelation is not accounted for in statistical models, results may be biased. Therefore, to properly control for spatial autocorrelation, spatially explicit models incorporating spatial random effects were applied using the R package sdmTMB (Anderson et al., 2024).

Following the model requirements, a mesh was created using the k-means clustering algorithm to define the minimum distance allowed between points in the X and Y axes, measured in km (see Anderson et al., 2024, for more details).

The inclusion of spatial random effects allows the model to capture spatial influences not explained by the environmental variables but that still condition species distributions (for example, gregarious behaviour or species settling in particular areas for unknown reasons).

For the distribution or probability-of-presence models, a binomial distribution was used by converting abundances to Presence/Absence (1 and 0, respectively). To incorporate potential nonlinear relationships between environmental variables and species distributions, Generalized Additive Models (GAM) were used.

Furthermore, to reduce collinearity, only variables with a Pearson correlation coefficient below 0.7 were selected, retaining those with greater general biological meaning. All variables were scaled (i.e., centred on the mean) before inclusion to facilitate coefficient interpretation and improve model convergence (full-model approach; Whittingham et al., 2006).

Because each species has different ecological requirements, an average of 100 models was run for each one, and the model with the lowest BIC (Bayesian Information Criterion) was selected as the best fit.
After fitting the model, the sdmTMB function sanity() was used to verify the validity and stability of the estimates. This function performs a series of essential checks described in Anderson et al. (2024).

To evaluate the predictive capacity of the best model, 80% of the data was used for training and 20% for testing (Elith and Leathwick, 2009), and the AUC, Sensitivity (True Positive Rate, TPR), and Specificity (True Negative Rate, TNR) were calculated.

Finally, to produce a more detailed representation of the probability of presence, probability maps at the 1×1 km scale were downscaled to 100×100 m.

Habitat selection

Although the abundance information collected in only one year of fieldwork is insufficient to estimate species abundance, it does allow exploratory analyses of the relationship between species density and the habitat they occupy. For each transect, the total abundance of each species (i.e., the total number of individuals) was extracted, and density was calculated as individuals per 500 metres. This makes abundances comparable across transects of different lengths.

To study habitat selection, the different habitat categories recorded by volunteers in the field were translated into broader categories that allow clearer visualisation of relationships (figure 1). Boxplots were then used to explore these relationships.

Figure 1. Distribution map of each habitat type considered in the Sierra de Guadarrama National Park within the Community of Madrid.

Example of species results:

Coal Tit

Figure 2. Confirmed presence represented by points where the species was detected during the fieldwork of this atlas and through recent eBird observations (or both systems; direct fieldwork and eBird observations). In addition, the 1×1 km grid coordinates and the national park boundary are shown with a dashed line.

Figure 3. Modelled distribution of the species. Confirmed presence is represented by points where the species was detected in the direct fieldwork of this atlas and through recent eBird observations. Additionally, the probability of presence from 0 to 1 is shown at a 100 m resolution using a colour scale with different intensities (with 1 being a 100% probability of presence according to the model and a darker colour). Finally, the 1×1 km grid coordinates and the national park boundary are represented with a dashed line.

Figure 4. Dependence of the probability of presence based on the percentage of evergreen woodland in the 1×1 km UTM cells. The solid line shows the mean trend and the shaded area represents the standard error of the prediction.

Figure 5. Variability in species density per 500 metres of transect with respect to the habitat classification used, coloured according to level 1 of the habitat classification applied during the fieldwork of this atlas. The points show the distribution of the data; the diamonds represent the mean value; in the boxplot, the lower and upper limits correspond to the 25th and 75th percentiles, respectively; the thick horizontal line represents the median; and the whiskers represent the 10–90 percentiles.

Bibliography

Anderson, S. C., Ward, E. J., English, P. A., Barnett, L. A. K., & Thorson, J. T. (2024). sdmTMB: An R Package for Fast, Flexible, and User-Friendly Generalized Linear Mixed Effects Models with Spatial and Spatiotemporal Random Fields.
BioRxiv, 2022.03.24.485545. https://doi.org/10.1101/2022.03.24.485545

eBird Basic Dataset. (2024). eBird Basic Dataset.

Elith, J., & Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics, 40(1), 677–697.

F. Dormann, C., M. McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., G. Davies, R., Hirzel, A., Jetz, W., & Daniel Kissling, W. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography, 30(5), 609–628.

Strimas-Mackey, M., Miller, E., & Hochachka, W. (2023). auk: eBird Data Extraction and Processing with AWK. https://cornelllabofornithology.github.io/auk/

Whittingham, M. J., Stephens, P. A., Bradbury, R. B., & Freckleton, R. P. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75(5), 1182–1189.

Yamazaki, D., Ikeshima, D., Tawatari, R., Yamaguchi, T., O’Loughlin, F., Neal, J. C., Sampson, C. C., Kanae, S., & Bates, P. D. (2017). A high-accuracy map of global terrain elevations. Geophysical Research Letters, 44(11), 5844–5853. https://doi.org/https://doi.org/10.1002/2017GL072874

Atlas of Breeding Birds of the Sierra de Guadarrama National Park

III ATLAS OF BIRDS during the breeding season in Spain