Kingdom of Bhutan · Eastern Himalayas

Technical Guide & Modeling Protocol

High-resolution (250 m) historical and projected CMIP6 bioclimatic variable pipeline for Bhutan — built for species distribution modeling, climate vulnerability assessment, and conservation science.

0 BIO sets
0 Variables
0 GCMs
0 SSPs
250m Resolution
EPSG:5266 DRUKREF 03 1986 – 2100 Version 1.0.0 MIT Licensed
00

Abstract

This protocol describes the development and application of BhutanBioClims—a comprehensive, high-resolution (250 m) bioclimatic dataset for Bhutan derived from CMIP6 Global Climate Models (GCMs). The pipeline provides 121 sets of 19 standard BIOCLIM variables (Booth et al., 2014), spanning historical (1986–2015) and three future time slices (2021–2050, 2051–2080, 2071–2100) across four Shared Socioeconomic Pathways (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5) using 10 GCMs.

CMIP6 GCM outputs were downscaled against historical data developed with Bhutan's national weather station network (Stewart et al., 2017; Stewart et al., 2021) using the delta change method applied to anomalies interpolated via bivariate thin plate splines. Bioclimatic variables were computed in R using the terra package (Hijmans et al., 2023), with automatic unit conversion from Kelvin to °C and from kg m⁻² s⁻¹ to mm month⁻¹ where required. The pipeline — not the rasters — is what this repository provides: a reproducible, scripted workflow that any user can run on the CSIRO source data to obtain analysis-ready bioclimatic layers. The dataset is rigorously quality-controlled for spatial alignment and multicollinearity, providing a modeling-ready foundation for species distribution modeling (SDM), climate vulnerability assessments, and environmental applications across the Eastern Himalayas.

Intended Audience

This pipeline is developed for Bhutanese ecologists, conservationists, climatologists, environmental scientists, agricultural researchers, forestry professionals, wildlife biologists, land-use planners, and policymakers—as well as the broader international research community—interested in climate modeling, SDM, habitat suitability analysis, climate change vulnerability assessments, and conservation planning in the Himalayan region.

01

Repository Value Proposition

Source Data: CSIRO (Dorji et al., 2025)

Raw monthly climate variables (precipitation, Tmax, Tmin) requiring substantial preprocessing before application in SDM workflows.

This Pipeline: BhutanBioClims v1.0

A reproducible pipeline that computes bioclimatic variables, multi-model ensemble products, and quality-controlled model-ready layers from the CSIRO source data. Download the source data, run the pipeline, and obtain analysis-ready outputs — no preprocessing code to write.

Comparative Feature Analysis

Feature Source CSIRO Dataset BhutanBioClims v1.0
BIOCLIM Variables Not provided — user computation required ✓ Included 19 variables computed by pipeline for all GCM/SSP combinations
Ensemble Products Single GCM files only — manual aggregation required ✓ Included Ensemble mean, SD, minimum, and maximum across 10 GCMs
Quality Assurance Not included in source dataset release ✓ Validated QC reports in 06_quality_control/ covering alignment, CRS, extent, resolution, and NA checks
Multicollinearity Assessment Not provided ✓ Included VIF analysis for predictor variable screening
Model-Ready Products Manual stacking and formatting required ✓ Pipeline provided Outputs are standard GeoTIFFs loadable in MaxEnt, biomod2, and other SDM platforms — see §6.4 for loading examples
Technical Documentation Basic README file ✓ Comprehensive Interactive guide, detailed README, complete citation framework
Institutional Custodianship 🇦🇺 Australian institutions (CSIRO, UQ) 🇧🇹 Kingdom of Bhutan — Ministry of Energy and Natural Resources
Time to Analysis Substantial manual preprocessing required from scratch ✓ All computation automated by pipeline scripts

Preprocessing Time Comparison

Without This Pipeline
  • Download and organize all monthly CMIP6 source files
  • Write code to compute 19 BIOCLIM variables per GCM/SSP/period
  • Validate spatial alignment & CRS consistency
  • Construct multi-model ensemble statistics
  • Perform multicollinearity diagnostics (VIF)
  • Prepare model-ready raster stacks
  • Substantial preprocessing effort required before analysis can begin
Using BhutanBioClims v1.0 Pipeline
  • Download source data from CSIRO (Dorji et al., 2025)
  • Place monthly rasters in 01_raw_cmip6_data/01_cmip6_monthly/ preserving the GCM/period/SSP folder structure
  • Run: Rscript 10_scripts/r/bioclim_master.R --input_root ./01_raw_cmip6_data/01_cmip6_monthly --output_root ./03_bioclim_variables/01_bioclim_by_gcm
  • BIO1–BIO19 computed automatically for all GCMs, SSPs, and time periods
  • Run Rscript 10_scripts/r/build_ensembles.R to generate ensemble products (mean, SD, min, max)
  • Run Rscript 10_scripts/r/qc_validation.R for spatial alignment and CRS checks
  • Run Rscript 10_scripts/r/multicollinearity_screening.R for VIF predictor screening
  • All steps fully scripted — no preprocessing code to write

Primary Beneficiaries

Graduate Researchers

Use the ready-to-run pipeline to process CSIRO source data without writing any preprocessing code. Focus on ecological interpretation rather than computational infrastructure.

Conservation Organizations

Execute rapid climate vulnerability assessments for protected area networks. Compare SSP scenarios without specialized technical capacity.

Government & Policy

Generate policy-informing projections with quantified uncertainty (ensemble mean ± SD). Produce defensible reports for national adaptation planning.

International Research

Access reproducible workflow from raw CMIP6 to analysis-ready SDM layers. Cite both Dorji et al. (2025) for source data and Wangdi (2026) for this pipeline.

Getting Started — This repository shares code, not rasters

The CSIRO dataset provides raw climate variables (ingredients). BhutanBioClims v1.0 provides the R pipeline to compute analysis-ready bioclimatic products (finished output). To get started:

  1. Download source climate grids from https://doi.org/10.25919/pec2-hs50
  2. Place the monthly rasters in 01_raw_cmip6_data/01_cmip6_monthly/ (preserving the GCM/period/SSP folder structure)
  3. Run the pipeline: Rscript 10_scripts/r/bioclim_master.R --input_root ./01_raw_cmip6_data/01_cmip6_monthly --output_root ./03_bioclim_variables/01_bioclim_by_gcm

Unit conversion and BIO1–BIO19 computation are handled by bioclim_master.R. Ensemble aggregation, QC validation, and multicollinearity screening each have dedicated scripts — all fully scripted, no code to write from scratch.

02

Core Metadata

Version1.0.0Last Updated2026-04-23
Geographic Extent (WGS84)26°42′N – 28°12′N, 88°42′E – 92°9′E (Bhutan National Boundary)
Spatial Extent (EPSG:5266)xmin: 125,694.5  |  ymin: 2,954,438  |  xmax: 460,694.5  |  ymax: 3,125,938
CRS / EPSGDRUKREF 03 / Bhutan National Grid (EPSG:5266)DatumD_DRUKREF_03
Pixel Size250 m (approx. 9 arc-seconds)Grid AlignmentValidated via qc_validation.R
Nodata Value−9999Units°C (Temperature) · mm (Precipitation)
Temporal Coverage1986 – 2100Time SlicesHistorical · 2021–50 · 2051–80 · 2071–2100
ScenariosSSP1-2.6 · SSP2-4.5 · SSP3-7.0 · SSP5-8.5
GCMs (n=10)ACCESS-CM2 · CNRM-CM6-1 · CNRM-ESM2-1 · INM-CM4-8 · INM-CM5-0 · MIROC-ES2L · MIROC6 · MPI-ESM1-2-LR · MRI-ESM2-0 · NorESM2-MM
Bias CorrectionApplied by Dorji et al. (2025) — specific method documented in source publication
Downscaling MethodStatistical downscaling to 250 m per Dorji et al. (2025) — delta change method, bivariate thin plate splines
Ensemble WeightingEqual weighting — unweighted arithmetic mean across 10 GCMs
03

Methods

3.1 Source Data & Preprocessing

The primary inputs are monthly mean temperature, minimum temperature, maximum temperature, and precipitation grids obtained from Dorji et al. (2025). Source data were generated via downscaling to a 250 m resolution. Monthly files are organized by GCM, SSP, and time slice in the 01_raw_cmip6_data directory.

3.2 Bias Correction

Source Data Processing

Bias correction was applied by Dorji et al. (2025) prior to distribution of the source dataset. The specific bias correction methodology is documented in the source publication. No additional bias correction was applied in this pipeline beyond what was provided in the source data.

3.3 Downscaling

The source data from Dorji et al. (2025) were downscaled to 250 m resolution using statistical downscaling methods. The source dataset provides monthly temperature (mean, minimum, maximum) and precipitation grids for Bhutan. Details of the downscaling technique are available in the source publication.

3.4 BIOCLIM Derivation

The 19 standard bioclimatic variables (BIO1–BIO19) were computed from monthly tasmin, tasmax, and pr rasters using the terra R package (Hijmans et al., 2023), following the standard definitions of O'Donnell & Ignizio (2012) and Nix (1986). These variables capture annual trends, seasonality, and extreme or limiting environmental factors.

Unit conversions are applied automatically by the pipeline: temperatures exceeding 150 (indicating Kelvin) are shifted by −273.15 to Celsius; precipitation values below 5 (indicating kg m⁻² s⁻¹) are multiplied by 2,628,000 to obtain mm month⁻¹. Quarterly variables (BIO8–BIO11, BIO16–BIO19) are computed using all 12 rolling 3-month windows via pixel-wise index operations — no approximation to fixed calendar quarters.

3.5 Ensemble Aggregation

Multi-model ensembles are calculated across the 10 GCMs. Aggregation uses equal (unweighted) weighting for all models, computing the arithmetic mean across all available GCMs for each time slice, scenario, and BIOCLIM variable.

Ensemble Mean: μ = (1/n) × Σ GCMi
Standard Deviation: σ = √[(1/n) × Σ (GCMi − μ)²]

3.6 Quality Control Acceptance Criteria

A file is only promoted to release status if it meets all criteria:

  • Alignment: 100% pixel-perfect alignment with the Bhutan National Mask — verified by qc_validation.R
  • CRS: Correct projection (EPSG:5266 — DRUKREF 03) embedded in metadata
  • Extent: xmin: 125,694.5 · ymin: 2,954,438 · xmax: 460,694.5 · ymax: 3,125,938 (EPSG:5266)
  • Completeness: No missing pixels within the Bhutan administrative boundary
  • Range Check: Values within physically plausible bounds (e.g., BIO01 between −30°C and +40°C)
  • Integrity: File integrity verified — see 06_quality_control/integrity_summary.json
04

Folder & Variable Architecture

The repository follows a structured 10-stage pipeline to ensure data provenance and reproducibility.

StageContentsRole in Modeling
00_project_metadataManifests, citations, and guidesMetadata and provenance
01_raw_cmip6_dataSource monthly rasters from CSIROPipeline input (place downloaded data here)
03_bioclim_variablesPer-GCM BIOCLIM rastersGCM variability analysis
04_ensemble_productsEnsemble mean, standard deviation, minimum, maximum — one subdirectory per statisticPrimary source for projections and uncertainty assessment
05_multicollinearity_analysisVIF and correlation reportsPredictor screening
06_quality_controlAlignment and integrity reportsQC verification
08_model_ready_layersPlaceholder for SDM-ready predictor stacks after QA and VIF selectionIntended destination for curated SDM-ready outputs
10_scriptsR and PowerShell pipeline scriptsAll processing automation
BIOCLIM Variable Dictionary (Key Variables)

For the full list, see variable_dictionary.csv.

CodeVariable NameUnitsDescription
BIO01Annual Mean Temperature°CAverage of monthly mean temperatures
BIO04Temperature SeasonalityunitlessStandard deviation of monthly mean temp × 100
BIO12Annual PrecipitationmmSum of all monthly precipitation values
BIO15Precipitation Seasonality%Coefficient of variation of monthly precipitation

Pipeline Scripts

ScriptLocationPurpose
bioclim_master.R10_scripts/r/Main entry point — computes BIO1–BIO19 from monthly rasters and runs ensemble aggregation for all GCMs, SSPs, and time periods
build_ensembles.R10_scripts/r/Reads per-GCM outputs from 03_bioclim_variables/01_bioclim_by_gcm and writes ensemble mean, standard deviation, minimum, and maximum to 04_ensemble_products/
multicollinearity_screening.R10_scripts/r/Pearson correlation matrix and iterative VIF analysis on ensemble mean layers for predictor screening; reads from 04_ensemble_products/ensemble_mean/
qc_validation.R10_scripts/r/Spatial alignment, CRS, extent, resolution, and NA-proportion checks across all output rasters in 03_bioclim_variables/01_bioclim_by_gcm

Running the Pipeline

The main entry point is bioclim_master.R. It auto-detects all GCMs, time periods, and SSP scenarios from the input directory structure — no configuration file needed.

# Minimal — auto-detect all models, periods, and scenarios
Rscript 10_scripts/r/bioclim_master.R \
  --input_root ./01_raw_cmip6_data/01_cmip6_monthly \
  --output_root ./03_bioclim_variables/01_bioclim_by_gcm

# Selective run — specify models and scenarios, tune memory
Rscript 10_scripts/r/bioclim_master.R \
  --input_root ./01_raw_cmip6_data/01_cmip6_monthly \
  --output_root ./03_bioclim_variables/01_bioclim_by_gcm \
  --models "ACCES_CM2,MIROC6,NorESM2-MM" \
  --scenarios "SSP245,SSP585" \
  --overwrite TRUE \
  --memfrac 0.5 \
  --tempdir "D:/terra_scratch"

# Run multicollinearity screening after BIO computation
Rscript 10_scripts/r/multicollinearity_screening.R \
  --project_root . \
  --time_slice "1986_2015" \
  --scenario "historical"
ArgumentDefaultDescription
--input_rootrequiredRoot directory containing CMIP6 monthly rasters (the 01_raw_cmip6_data/01_cmip6_monthly folder)
--output_rootrequiredDestination root for BIO1–BIO19 GeoTIFF outputs
--modelsauto-detectComma-separated GCM folder names (e.g., ACCES_CM2,MIROC6)
--periodsauto-detectComma-separated time periods (e.g., 1986-2015,2021-2050)
--scenariosauto-detectComma-separated SSP codes (e.g., SSP245,SSP585)
--ensemble_statsmean,median,min,max,sdStatistics to compute for multi-model ensemble
--skip_ensembleFALSESkip ensemble step (use when running per-GCM outputs only)
--overwriteFALSEOverwrite existing output files
--memfrac0.3Fraction of RAM allocated to terra; increase on high-RAM systems
--tempdirsystem tempScratch directory for terra disk-backed operations; redirect to a fast disk if possible
--log_root<output_root>/_logsDirectory for timestamped run logs and summary CSV

Output Filename Convention & GeoTIFF Specifications

All output rasters use a standardized naming pattern that encodes model, scenario, period, and variable:

bhutan_cmip6_{model}_{scenario}_{period}_bio{nn}_v1_0.tif

Example: bhutan_cmip6_access_cm2_ssp245_2021_2050_bio01_v1_0.tif

GeoTIFF SpecificationValue
CompressionDEFLATE · Predictor = 2 (horizontal differencing)
TilingYES — optimized for spatial subsetting without full file read
No-data flag−9999
CRSEPSG:5266 — DRUKREF 03 / Bhutan National Grid
Pixel size250 m
Memory strategyDisk-backed terra operations; never loads the full dataset into RAM
05

Uncertainty & Interpretation

Ensemble Products Explained

Ensemble Mean (ensemble_mean): Represents the consensus trajectory. Recommended for initial model runs and general assessments.

Standard Deviation (ensemble_standard_deviation): Quantifies GCM disagreement. Areas with high SD (e.g., high-altitude precipitation) should be interpreted with caution.

Reporting Uncertainty: When publishing SDM results, report the mean prediction alongside maps showing the range (Min/Max) or uncertainty (SD) to avoid overconfident projections.

Resolution Caution

Do not overinterpret the 250 m resolution. While the grid is fine-scaled, the underlying CMIP6 signals originate from much coarser GCMs. The local detail is a product of downscaling assumptions and should not be mistaken for native GCM resolution.

06

Modeling Best Practices

6.1 Multicollinearity Screening

To avoid overfitting, screen predictors before training models:

  • Consult 05_multicollinearity_analysis/vif_results.csv
  • VIF < 5: Preferred for high-rigor models
  • VIF < 10: Acceptable but requires justification
  • Always prioritize ecological relevance over statistical correlation alone

6.2 Spatial Autocorrelation

High-resolution grids often exhibit strong spatial autocorrelation. We recommend:

  • Spatial Thinning: Ensure occurrence points are not clustered within a single 250 m cell
  • Block Cross-Validation: Use spatial blocks for model validation rather than random k-fold

6.3 Scenario Reporting Template

When reporting results, use a template such as:

"Predictions were generated for 2051–2080 under the SSP2-4.5 scenario, representing an intermediate emissions trajectory using a 10-GCM ensemble mean (Bhutan CMIP6 BIOCLIM v1.0)."

6.4 Loading Pipeline Outputs in R

All pipeline outputs are standard compressed GeoTIFFs loadable with any R spatial library. Use terra (recommended, matches the pipeline) or sf/stars as needed.

library(terra)

# ── Single ensemble mean raster (historical BIO01) ───────────────────
# Filename pattern: bhutan_cmip6_ensemble_{stat}_{ssp}_{period}_{bio}_v1_0.tif
bio01_hist <- rast(
  "04_ensemble_products/ensemble_mean/1986_2015/historical/bhutan_cmip6_ensemble_mean_historical_1986_2015_bio01_v1_0.tif"
)

# ── Full 19-variable ensemble mean stack for a future scenario ────────
ssp245_2050 <- rast(
  list.files(
    "04_ensemble_products/ensemble_mean/2021_2050/ssp245/",
    pattern = "^bhutan_cmip6_ensemble_mean.*\\.tif$",
    full.names = TRUE
  )
)

# ── Ensemble standard deviation (uncertainty map) ─────────────────────
bio01_sd <- rast(
  "04_ensemble_products/ensemble_standard_deviation/2021_2050/ssp245/bhutan_cmip6_ensemble_standard_deviation_ssp245_2021_2050_bio01_v1_0.tif"
)

# ── VIF-screened predictor set for SDM ───────────────────────────────
# Consult 05_multicollinearity_analysis/selected_predictors.txt for retained variables.
# Load only those from the ensemble mean directory:
selected_vars <- readLines("05_multicollinearity_analysis/selected_predictors.txt")
mean_files <- list.files(
  "04_ensemble_products/ensemble_mean/1986_2015/historical/",
  pattern = "\\.tif$", full.names = TRUE
)
sdm_predictors <- rast(
  mean_files[sapply(mean_files, function(f)
    any(sapply(selected_vars, function(v) grepl(v, basename(f), fixed = TRUE)))
  )]
)

# ── Stack individual GCM outputs to quantify inter-model spread ───────
# Per-GCM file pattern: bhutan_cmip6_{model}_{scenario}_{period}_{bio}_v1_0.tif
gcm_bio01 <- rast(
  list.files(
    "03_bioclim_variables/01_bioclim_by_gcm/",
    pattern = "bio01_v1_0\\.tif$",
    recursive = TRUE, full.names = TRUE
  )
)
bio01_range <- max(gcm_bio01) - min(gcm_bio01)  # pixel-wise inter-model range

# ── Extract values at species occurrence points ───────────────────────
# occ_pts <- vect("occurrences.gpkg")
# climate_vals <- extract(sdm_predictors, occ_pts)
Consult the VIF results before stacking predictors

Check 05_multicollinearity_analysis/vif_results.csv and use only the retained variable set listed there. Loading all 19 variables without screening will introduce collinearity into your SDM.

07

Limitations

Understanding Constraints
  • Scenario Uncertainty: Future trajectories are based on socioeconomic assumptions that may change
  • GCM Structural Uncertainty: Different models simulate climate physics differently (partially addressed by our 10-model ensemble)
  • Downscaling Assumptions: Methods assume historical climate relationships (e.g., lapse rates) remain constant in the future
  • Temporal Mismatch: 30-year averages may mask short-term climate extremes or decadal variability
  • Pseudo-precision: High resolution (250 m) does not imply high accuracy in complex terrain where station data is sparse
08

Intended & Not Intended Use

Intended Use
  • Regional and national species distribution modeling (SDM)
  • Climate change vulnerability and impact assessments
  • Strategic conservation planning and protected area gap analysis
  • Ecological niche modeling and habitat suitability studies
  • Academic research and institutional decision support
Not Intended Use
  • Local-scale engineering or infrastructure design (e.g., dam building)
  • Short-term weather forecasting
  • Analysis of individual extreme weather events (e.g., specific GLOF events)
  • Site-specific agricultural planning without local validation
  • Legal or regulatory compliance without expert consultation
09

Citation & Attribution

Primary Source — Required Citation (APA 7)

Dorji, S., Stewart, S., Bajwa, A., Aziz, A., Shabbir, A., & Adkins, S. (2025). High-resolution (250 m) historical and projected (CMIP6) air temperature and precipitation grids for Bhutan (v1) [Data set]. CSIRO. https://doi.org/10.25919/pec2-hs50

Pipeline Citation

Wangdi (2026). BhutanBioClims: CMIP6 Bioclimatic Variable Pipeline for Bhutan (v1.0). Forest Resources Planning and Management Division, Department of Forests and Park Services, Ministry of Energy and Natural Resources, Royal Government of Bhutan, Thimphu, Bhutan. GitHub repository

Credit & Source Development Team

The source CSIRO dataset was developed by Sangay Dorji (University of Queensland) with advice and technical guidance from:

  • Stephen Stewart — CSIRO Environment, Natural Capital Assessment
  • Ali Bajwa — La Trobe University
  • Asad Shabbir — NSW DPI
  • Ammar Aziz — University of Queensland
  • Steve Adkins — University of Queensland
CMIP6 Acknowledgment

"We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output."

How to Cite This Protocol
Wangdi (2026). Technical Guide & Modeling Protocol
for the Bhutan CMIP6 BIOCLIM Infrastructure (v1.0). Thimphu, Bhutan.
Key References
  • Booth et al. (2014). bioclim: the first species distribution modelling package. Diversity and Distributions, 20(1), 1–9. doi:10.1111/ddi.12144
  • O'Donnell, M.S. & Ignizio, D.A. (2012). Bioclimatic predictors for supporting ecological applications in the conterminous United States. U.S. Geological Survey Data Series 691.
  • Nix, H.A. (1986). A biogeographic analysis of Australian elapid snakes. In: Longmore, R. (ed.), Atlas of Elapid Snakes of Australia, pp. 4–15. Australian Government Publishing Service, Canberra.
  • Dorji et al. (2025). Comparative Analysis of Mechanistic and Correlative Models for Bhutan. Plants, 14(1). doi:10.3390/plants14010083
  • Hijmans, R.J. et al. (2023). terra: Spatial Data Analysis. R package v1.7. CRAN
  • O'Neill et al. (2016). The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6. Geosci. Model Dev., 9(9). doi:10.5194/gmd-9-3461-2016
  • Riahi et al. (2017). The Shared Socioeconomic Pathways. Global Environmental Change, 42:153–168.
  • Stewart et al. (2017). Topography and the north Indian monsoon on climate interpolation in Bhutan. Int. J. Climatology, 37(S1). doi:10.1002/joc.5045
  • Stewart et al. (2021). Interpolated climate variables for Bhutan [Raster]. CSIRO. doi:10.25919/m8yh-gt42
10

Release Checklist

Completion Status 0%