Technical Guide & Modeling Protocol
High-resolution (250 m) historical and projected CMIP6 bioclimatic variable pipeline for Bhutan — built for species distribution modeling, climate vulnerability assessment, and conservation science.
OverviewAbstract
This protocol describes the development and application of BhutanBioClims—a comprehensive, high-resolution (250 m) bioclimatic dataset for Bhutan derived from CMIP6 Global Climate Models (GCMs). The pipeline provides 121 sets of 19 standard BIOCLIM variables (Booth et al., 2014), spanning historical (1986–2015) and three future time slices (2021–2050, 2051–2080, 2071–2100) across four Shared Socioeconomic Pathways (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5) using 10 GCMs.
CMIP6 GCM outputs were downscaled against historical data developed with Bhutan's national weather station network (Stewart et al., 2017; Stewart et al., 2021) using the delta change method applied to anomalies interpolated via bivariate thin plate splines. Bioclimatic variables were computed in R using the terra package (Hijmans et al., 2023), with automatic unit conversion from Kelvin to °C and from kg m⁻² s⁻¹ to mm month⁻¹ where required. The pipeline — not the rasters — is what this repository provides: a reproducible, scripted workflow that any user can run on the CSIRO source data to obtain analysis-ready bioclimatic layers. The dataset is rigorously quality-controlled for spatial alignment and multicollinearity, providing a modeling-ready foundation for species distribution modeling (SDM), climate vulnerability assessments, and environmental applications across the Eastern Himalayas.
This pipeline is developed for Bhutanese ecologists, conservationists, climatologists, environmental scientists, agricultural researchers, forestry professionals, wildlife biologists, land-use planners, and policymakers—as well as the broader international research community—interested in climate modeling, SDM, habitat suitability analysis, climate change vulnerability assessments, and conservation planning in the Himalayan region.
Why this pipelineRepository Value Proposition
Raw monthly climate variables (precipitation, Tmax, Tmin) requiring substantial preprocessing before application in SDM workflows.
A reproducible pipeline that computes bioclimatic variables, multi-model ensemble products, and quality-controlled model-ready layers from the CSIRO source data. Download the source data, run the pipeline, and obtain analysis-ready outputs — no preprocessing code to write.
Comparative Feature Analysis
| Feature | Source CSIRO Dataset | BhutanBioClims v1.0 |
|---|---|---|
| BIOCLIM Variables | Not provided — user computation required | ✓ Included 19 variables computed by pipeline for all GCM/SSP combinations |
| Ensemble Products | Single GCM files only — manual aggregation required | ✓ Included Ensemble mean, SD, minimum, and maximum across 10 GCMs |
| Quality Assurance | Not included in source dataset release | ✓ Validated QC reports in 06_quality_control/ covering alignment, CRS, extent, resolution, and NA checks |
| Multicollinearity Assessment | Not provided | ✓ Included VIF analysis for predictor variable screening |
| Model-Ready Products | Manual stacking and formatting required | ✓ Pipeline provided Outputs are standard GeoTIFFs loadable in MaxEnt, biomod2, and other SDM platforms — see §6.4 for loading examples |
| Technical Documentation | Basic README file | ✓ Comprehensive Interactive guide, detailed README, complete citation framework |
| Institutional Custodianship | 🇦🇺 Australian institutions (CSIRO, UQ) | 🇧🇹 Kingdom of Bhutan — Ministry of Energy and Natural Resources |
| Time to Analysis | Substantial manual preprocessing required from scratch | ✓ All computation automated by pipeline scripts |
Preprocessing Time Comparison
- Download and organize all monthly CMIP6 source files
- Write code to compute 19 BIOCLIM variables per GCM/SSP/period
- Validate spatial alignment & CRS consistency
- Construct multi-model ensemble statistics
- Perform multicollinearity diagnostics (VIF)
- Prepare model-ready raster stacks
- Substantial preprocessing effort required before analysis can begin
- Download source data from CSIRO (Dorji et al., 2025)
- Place monthly rasters in
01_raw_cmip6_data/01_cmip6_monthly/preserving the GCM/period/SSP folder structure - Run:
Rscript 10_scripts/r/bioclim_master.R --input_root ./01_raw_cmip6_data/01_cmip6_monthly --output_root ./03_bioclim_variables/01_bioclim_by_gcm - ✓ BIO1–BIO19 computed automatically for all GCMs, SSPs, and time periods
- ✓ Run
Rscript 10_scripts/r/build_ensembles.Rto generate ensemble products (mean, SD, min, max) - ✓ Run
Rscript 10_scripts/r/qc_validation.Rfor spatial alignment and CRS checks - ✓ Run
Rscript 10_scripts/r/multicollinearity_screening.Rfor VIF predictor screening - All steps fully scripted — no preprocessing code to write
Primary Beneficiaries
Use the ready-to-run pipeline to process CSIRO source data without writing any preprocessing code. Focus on ecological interpretation rather than computational infrastructure.
Execute rapid climate vulnerability assessments for protected area networks. Compare SSP scenarios without specialized technical capacity.
Generate policy-informing projections with quantified uncertainty (ensemble mean ± SD). Produce defensible reports for national adaptation planning.
Access reproducible workflow from raw CMIP6 to analysis-ready SDM layers. Cite both Dorji et al. (2025) for source data and Wangdi (2026) for this pipeline.
The CSIRO dataset provides raw climate variables (ingredients). BhutanBioClims v1.0 provides the R pipeline to compute analysis-ready bioclimatic products (finished output). To get started:
- Download source climate grids from https://doi.org/10.25919/pec2-hs50
- Place the monthly rasters in
01_raw_cmip6_data/01_cmip6_monthly/(preserving the GCM/period/SSP folder structure) - Run the pipeline:
Rscript 10_scripts/r/bioclim_master.R --input_root ./01_raw_cmip6_data/01_cmip6_monthly --output_root ./03_bioclim_variables/01_bioclim_by_gcm
Unit conversion and BIO1–BIO19 computation are handled by bioclim_master.R. Ensemble aggregation, QC validation, and multicollinearity screening each have dedicated scripts — all fully scripted, no code to write from scratch.
Dataset propertiesCore Metadata
| Version | 1.0.0 | Last Updated | 2026-04-23 |
|---|---|---|---|
| Geographic Extent (WGS84) | 26°42′N – 28°12′N, 88°42′E – 92°9′E (Bhutan National Boundary) | ||
| Spatial Extent (EPSG:5266) | xmin: 125,694.5 | ymin: 2,954,438 | xmax: 460,694.5 | ymax: 3,125,938 | ||
| CRS / EPSG | DRUKREF 03 / Bhutan National Grid (EPSG:5266) | Datum | D_DRUKREF_03 |
| Pixel Size | 250 m (approx. 9 arc-seconds) | Grid Alignment | Validated via qc_validation.R |
| Nodata Value | −9999 | Units | °C (Temperature) · mm (Precipitation) |
| Temporal Coverage | 1986 – 2100 | Time Slices | Historical · 2021–50 · 2051–80 · 2071–2100 |
| Scenarios | SSP1-2.6 · SSP2-4.5 · SSP3-7.0 · SSP5-8.5 | ||
| GCMs (n=10) | ACCESS-CM2 · CNRM-CM6-1 · CNRM-ESM2-1 · INM-CM4-8 · INM-CM5-0 · MIROC-ES2L · MIROC6 · MPI-ESM1-2-LR · MRI-ESM2-0 · NorESM2-MM | ||
| Bias Correction | Applied by Dorji et al. (2025) — specific method documented in source publication | ||
| Downscaling Method | Statistical downscaling to 250 m per Dorji et al. (2025) — delta change method, bivariate thin plate splines | ||
| Ensemble Weighting | Equal weighting — unweighted arithmetic mean across 10 GCMs | ||
Processing protocolMethods
3.1 Source Data & Preprocessing
The primary inputs are monthly mean temperature, minimum temperature, maximum temperature, and precipitation grids obtained from Dorji et al. (2025). Source data were generated via downscaling to a 250 m resolution. Monthly files are organized by GCM, SSP, and time slice in the 01_raw_cmip6_data directory.
3.2 Bias Correction
Bias correction was applied by Dorji et al. (2025) prior to distribution of the source dataset. The specific bias correction methodology is documented in the source publication. No additional bias correction was applied in this pipeline beyond what was provided in the source data.
3.3 Downscaling
The source data from Dorji et al. (2025) were downscaled to 250 m resolution using statistical downscaling methods. The source dataset provides monthly temperature (mean, minimum, maximum) and precipitation grids for Bhutan. Details of the downscaling technique are available in the source publication.
3.4 BIOCLIM Derivation
The 19 standard bioclimatic variables (BIO1–BIO19) were computed from monthly tasmin, tasmax, and pr rasters using the terra R package (Hijmans et al., 2023), following the standard definitions of O'Donnell & Ignizio (2012) and Nix (1986). These variables capture annual trends, seasonality, and extreme or limiting environmental factors.
Unit conversions are applied automatically by the pipeline: temperatures exceeding 150 (indicating Kelvin) are shifted by −273.15 to Celsius; precipitation values below 5 (indicating kg m⁻² s⁻¹) are multiplied by 2,628,000 to obtain mm month⁻¹. Quarterly variables (BIO8–BIO11, BIO16–BIO19) are computed using all 12 rolling 3-month windows via pixel-wise index operations — no approximation to fixed calendar quarters.
3.5 Ensemble Aggregation
Multi-model ensembles are calculated across the 10 GCMs. Aggregation uses equal (unweighted) weighting for all models, computing the arithmetic mean across all available GCMs for each time slice, scenario, and BIOCLIM variable.
3.6 Quality Control Acceptance Criteria
A file is only promoted to release status if it meets all criteria:
- Alignment: 100% pixel-perfect alignment with the Bhutan National Mask — verified by
qc_validation.R - CRS: Correct projection (EPSG:5266 — DRUKREF 03) embedded in metadata
- Extent: xmin: 125,694.5 · ymin: 2,954,438 · xmax: 460,694.5 · ymax: 3,125,938 (EPSG:5266)
- Completeness: No missing pixels within the Bhutan administrative boundary
- Range Check: Values within physically plausible bounds (e.g., BIO01 between −30°C and +40°C)
- Integrity: File integrity verified — see
06_quality_control/integrity_summary.json
Pipeline structureFolder & Variable Architecture
The repository follows a structured 10-stage pipeline to ensure data provenance and reproducibility.
| Stage | Contents | Role in Modeling |
|---|---|---|
00_project_metadata | Manifests, citations, and guides | Metadata and provenance |
01_raw_cmip6_data | Source monthly rasters from CSIRO | Pipeline input (place downloaded data here) |
03_bioclim_variables | Per-GCM BIOCLIM rasters | GCM variability analysis |
04_ensemble_products | Ensemble mean, standard deviation, minimum, maximum — one subdirectory per statistic | Primary source for projections and uncertainty assessment |
05_multicollinearity_analysis | VIF and correlation reports | Predictor screening |
06_quality_control | Alignment and integrity reports | QC verification |
08_model_ready_layers | Placeholder for SDM-ready predictor stacks after QA and VIF selection | Intended destination for curated SDM-ready outputs |
10_scripts | R and PowerShell pipeline scripts | All processing automation |
For the full list, see variable_dictionary.csv.
| Code | Variable Name | Units | Description |
|---|---|---|---|
BIO01 | Annual Mean Temperature | °C | Average of monthly mean temperatures |
BIO04 | Temperature Seasonality | unitless | Standard deviation of monthly mean temp × 100 |
BIO12 | Annual Precipitation | mm | Sum of all monthly precipitation values |
BIO15 | Precipitation Seasonality | % | Coefficient of variation of monthly precipitation |
Pipeline Scripts
| Script | Location | Purpose |
|---|---|---|
bioclim_master.R | 10_scripts/r/ | Main entry point — computes BIO1–BIO19 from monthly rasters and runs ensemble aggregation for all GCMs, SSPs, and time periods |
build_ensembles.R | 10_scripts/r/ | Reads per-GCM outputs from 03_bioclim_variables/01_bioclim_by_gcm and writes ensemble mean, standard deviation, minimum, and maximum to 04_ensemble_products/ |
multicollinearity_screening.R | 10_scripts/r/ | Pearson correlation matrix and iterative VIF analysis on ensemble mean layers for predictor screening; reads from 04_ensemble_products/ensemble_mean/ |
qc_validation.R | 10_scripts/r/ | Spatial alignment, CRS, extent, resolution, and NA-proportion checks across all output rasters in 03_bioclim_variables/01_bioclim_by_gcm |
Running the Pipeline
The main entry point is bioclim_master.R. It auto-detects all GCMs, time periods, and SSP scenarios from the input directory structure — no configuration file needed.
# Minimal — auto-detect all models, periods, and scenarios
Rscript 10_scripts/r/bioclim_master.R \
--input_root ./01_raw_cmip6_data/01_cmip6_monthly \
--output_root ./03_bioclim_variables/01_bioclim_by_gcm
# Selective run — specify models and scenarios, tune memory
Rscript 10_scripts/r/bioclim_master.R \
--input_root ./01_raw_cmip6_data/01_cmip6_monthly \
--output_root ./03_bioclim_variables/01_bioclim_by_gcm \
--models "ACCES_CM2,MIROC6,NorESM2-MM" \
--scenarios "SSP245,SSP585" \
--overwrite TRUE \
--memfrac 0.5 \
--tempdir "D:/terra_scratch"
# Run multicollinearity screening after BIO computation
Rscript 10_scripts/r/multicollinearity_screening.R \
--project_root . \
--time_slice "1986_2015" \
--scenario "historical"
| Argument | Default | Description |
|---|---|---|
--input_root | required | Root directory containing CMIP6 monthly rasters (the 01_raw_cmip6_data/01_cmip6_monthly folder) |
--output_root | required | Destination root for BIO1–BIO19 GeoTIFF outputs |
--models | auto-detect | Comma-separated GCM folder names (e.g., ACCES_CM2,MIROC6) |
--periods | auto-detect | Comma-separated time periods (e.g., 1986-2015,2021-2050) |
--scenarios | auto-detect | Comma-separated SSP codes (e.g., SSP245,SSP585) |
--ensemble_stats | mean,median,min,max,sd | Statistics to compute for multi-model ensemble |
--skip_ensemble | FALSE | Skip ensemble step (use when running per-GCM outputs only) |
--overwrite | FALSE | Overwrite existing output files |
--memfrac | 0.3 | Fraction of RAM allocated to terra; increase on high-RAM systems |
--tempdir | system temp | Scratch directory for terra disk-backed operations; redirect to a fast disk if possible |
--log_root | <output_root>/_logs | Directory for timestamped run logs and summary CSV |
Output Filename Convention & GeoTIFF Specifications
All output rasters use a standardized naming pattern that encodes model, scenario, period, and variable:
Example: bhutan_cmip6_access_cm2_ssp245_2021_2050_bio01_v1_0.tif
| GeoTIFF Specification | Value |
|---|---|
| Compression | DEFLATE · Predictor = 2 (horizontal differencing) |
| Tiling | YES — optimized for spatial subsetting without full file read |
| No-data flag | −9999 |
| CRS | EPSG:5266 — DRUKREF 03 / Bhutan National Grid |
| Pixel size | 250 m |
| Memory strategy | Disk-backed terra operations; never loads the full dataset into RAM |
Ensemble interpretationUncertainty & Interpretation
Ensemble Mean (ensemble_mean): Represents the consensus trajectory. Recommended for initial model runs and general assessments.
Standard Deviation (ensemble_standard_deviation): Quantifies GCM disagreement. Areas with high SD (e.g., high-altitude precipitation) should be interpreted with caution.
Reporting Uncertainty: When publishing SDM results, report the mean prediction alongside maps showing the range (Min/Max) or uncertainty (SD) to avoid overconfident projections.
Do not overinterpret the 250 m resolution. While the grid is fine-scaled, the underlying CMIP6 signals originate from much coarser GCMs. The local detail is a product of downscaling assumptions and should not be mistaken for native GCM resolution.
Recommended workflowModeling Best Practices
6.1 Multicollinearity Screening
To avoid overfitting, screen predictors before training models:
- Consult
05_multicollinearity_analysis/vif_results.csv - VIF < 5: Preferred for high-rigor models
- VIF < 10: Acceptable but requires justification
- Always prioritize ecological relevance over statistical correlation alone
6.2 Spatial Autocorrelation
High-resolution grids often exhibit strong spatial autocorrelation. We recommend:
- Spatial Thinning: Ensure occurrence points are not clustered within a single 250 m cell
- Block Cross-Validation: Use spatial blocks for model validation rather than random k-fold
6.3 Scenario Reporting Template
When reporting results, use a template such as:
6.4 Loading Pipeline Outputs in R
All pipeline outputs are standard compressed GeoTIFFs loadable with any R spatial library. Use terra (recommended, matches the pipeline) or sf/stars as needed.
library(terra)
# ── Single ensemble mean raster (historical BIO01) ───────────────────
# Filename pattern: bhutan_cmip6_ensemble_{stat}_{ssp}_{period}_{bio}_v1_0.tif
bio01_hist <- rast(
"04_ensemble_products/ensemble_mean/1986_2015/historical/bhutan_cmip6_ensemble_mean_historical_1986_2015_bio01_v1_0.tif"
)
# ── Full 19-variable ensemble mean stack for a future scenario ────────
ssp245_2050 <- rast(
list.files(
"04_ensemble_products/ensemble_mean/2021_2050/ssp245/",
pattern = "^bhutan_cmip6_ensemble_mean.*\\.tif$",
full.names = TRUE
)
)
# ── Ensemble standard deviation (uncertainty map) ─────────────────────
bio01_sd <- rast(
"04_ensemble_products/ensemble_standard_deviation/2021_2050/ssp245/bhutan_cmip6_ensemble_standard_deviation_ssp245_2021_2050_bio01_v1_0.tif"
)
# ── VIF-screened predictor set for SDM ───────────────────────────────
# Consult 05_multicollinearity_analysis/selected_predictors.txt for retained variables.
# Load only those from the ensemble mean directory:
selected_vars <- readLines("05_multicollinearity_analysis/selected_predictors.txt")
mean_files <- list.files(
"04_ensemble_products/ensemble_mean/1986_2015/historical/",
pattern = "\\.tif$", full.names = TRUE
)
sdm_predictors <- rast(
mean_files[sapply(mean_files, function(f)
any(sapply(selected_vars, function(v) grepl(v, basename(f), fixed = TRUE)))
)]
)
# ── Stack individual GCM outputs to quantify inter-model spread ───────
# Per-GCM file pattern: bhutan_cmip6_{model}_{scenario}_{period}_{bio}_v1_0.tif
gcm_bio01 <- rast(
list.files(
"03_bioclim_variables/01_bioclim_by_gcm/",
pattern = "bio01_v1_0\\.tif$",
recursive = TRUE, full.names = TRUE
)
)
bio01_range <- max(gcm_bio01) - min(gcm_bio01) # pixel-wise inter-model range
# ── Extract values at species occurrence points ───────────────────────
# occ_pts <- vect("occurrences.gpkg")
# climate_vals <- extract(sdm_predictors, occ_pts)
Check 05_multicollinearity_analysis/vif_results.csv and use only the retained variable set listed there. Loading all 19 variables without screening will introduce collinearity into your SDM.
Known constraintsLimitations
- Scenario Uncertainty: Future trajectories are based on socioeconomic assumptions that may change
- GCM Structural Uncertainty: Different models simulate climate physics differently (partially addressed by our 10-model ensemble)
- Downscaling Assumptions: Methods assume historical climate relationships (e.g., lapse rates) remain constant in the future
- Temporal Mismatch: 30-year averages may mask short-term climate extremes or decadal variability
- Pseudo-precision: High resolution (250 m) does not imply high accuracy in complex terrain where station data is sparse
Scope of applicationIntended & Not Intended Use
- Regional and national species distribution modeling (SDM)
- Climate change vulnerability and impact assessments
- Strategic conservation planning and protected area gap analysis
- Ecological niche modeling and habitat suitability studies
- Academic research and institutional decision support
- Local-scale engineering or infrastructure design (e.g., dam building)
- Short-term weather forecasting
- Analysis of individual extreme weather events (e.g., specific GLOF events)
- Site-specific agricultural planning without local validation
- Legal or regulatory compliance without expert consultation
How to citeCitation & Attribution
Dorji, S., Stewart, S., Bajwa, A., Aziz, A., Shabbir, A., & Adkins, S. (2025). High-resolution (250 m) historical and projected (CMIP6) air temperature and precipitation grids for Bhutan (v1) [Data set]. CSIRO. https://doi.org/10.25919/pec2-hs50
Wangdi (2026). BhutanBioClims: CMIP6 Bioclimatic Variable Pipeline for Bhutan (v1.0). Forest Resources Planning and Management Division, Department of Forests and Park Services, Ministry of Energy and Natural Resources, Royal Government of Bhutan, Thimphu, Bhutan. GitHub repository
The source CSIRO dataset was developed by Sangay Dorji (University of Queensland) with advice and technical guidance from:
- Stephen Stewart — CSIRO Environment, Natural Capital Assessment
- Ali Bajwa — La Trobe University
- Asad Shabbir — NSW DPI
- Ammar Aziz — University of Queensland
- Steve Adkins — University of Queensland
"We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output."
Wangdi (2026). Technical Guide & Modeling Protocol
for the Bhutan CMIP6 BIOCLIM Infrastructure (v1.0). Thimphu, Bhutan.
- Booth et al. (2014). bioclim: the first species distribution modelling package. Diversity and Distributions, 20(1), 1–9. doi:10.1111/ddi.12144
- O'Donnell, M.S. & Ignizio, D.A. (2012). Bioclimatic predictors for supporting ecological applications in the conterminous United States. U.S. Geological Survey Data Series 691.
- Nix, H.A. (1986). A biogeographic analysis of Australian elapid snakes. In: Longmore, R. (ed.), Atlas of Elapid Snakes of Australia, pp. 4–15. Australian Government Publishing Service, Canberra.
- Dorji et al. (2025). Comparative Analysis of Mechanistic and Correlative Models for Bhutan. Plants, 14(1). doi:10.3390/plants14010083
- Hijmans, R.J. et al. (2023). terra: Spatial Data Analysis. R package v1.7. CRAN
- O'Neill et al. (2016). The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6. Geosci. Model Dev., 9(9). doi:10.5194/gmd-9-3461-2016
- Riahi et al. (2017). The Shared Socioeconomic Pathways. Global Environmental Change, 42:153–168.
- Stewart et al. (2017). Topography and the north Indian monsoon on climate interpolation in Bhutan. Int. J. Climatology, 37(S1). doi:10.1002/joc.5045
- Stewart et al. (2021). Interpolated climate variables for Bhutan [Raster]. CSIRO. doi:10.25919/m8yh-gt42