Kingdom of Bhutan · Eastern Himalayas

Technical Guide & Modeling Protocol

High-resolution (250 m) historical and projected CMIP6 bioclimatic variable pipeline for Bhutan — built for species distribution modeling, climate vulnerability assessment, and conservation science.

0 BIO sets

0 Variables

0 GCMs

0 SSPs

250m Resolution

EPSG:5266 DRUKREF 03 1986 – 2100 Version 1.0.0 MIT Licensed

OverviewAbstract

This protocol describes the development and application of BhutanBioClims—a comprehensive, high-resolution (250 m) bioclimatic dataset for Bhutan derived from CMIP6 Global Climate Models (GCMs). The pipeline provides 121 sets of 19 standard BIOCLIM variables (Booth et al., 2014), spanning historical (1986–2015) and three future time slices (2021–2050, 2051–2080, 2071–2100) across four Shared Socioeconomic Pathways (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5) using 10 GCMs.

CMIP6 GCM outputs were downscaled against historical data developed with Bhutan's national weather station network (Stewart et al., 2017; Stewart et al., 2021) using the delta change method applied to anomalies interpolated via bivariate thin plate splines. Bioclimatic variables were computed in R using the terra package (Hijmans et al., 2023), with automatic unit conversion from Kelvin to °C and from kg m⁻² s⁻¹ to mm month⁻¹ where required. The pipeline — not the rasters — is what this repository provides: a reproducible, scripted workflow that any user can run on the CSIRO source data to obtain analysis-ready bioclimatic layers. The dataset is rigorously quality-controlled for spatial alignment and multicollinearity, providing a modeling-ready foundation for species distribution modeling (SDM), climate vulnerability assessments, and environmental applications across the Eastern Himalayas.

Intended Audience

This pipeline is developed for Bhutanese ecologists, conservationists, climatologists, environmental scientists, agricultural researchers, forestry professionals, wildlife biologists, land-use planners, and policymakers—as well as the broader international research community—interested in climate modeling, SDM, habitat suitability analysis, climate change vulnerability assessments, and conservation planning in the Himalayan region.

Why this pipelineRepository Value Proposition

Source Data: CSIRO (Dorji et al., 2025)

Raw monthly climate variables (precipitation, Tmax, Tmin) requiring substantial preprocessing before application in SDM workflows.

This Pipeline: BhutanBioClims v1.0

A reproducible pipeline that computes bioclimatic variables, multi-model ensemble products, and quality-controlled model-ready layers from the CSIRO source data. Download the source data, run the pipeline, and obtain analysis-ready outputs — no preprocessing code to write.

Comparative Feature Analysis

Feature	Source CSIRO Dataset	BhutanBioClims v1.0
BIOCLIM Variables	Not provided — user computation required	✓ Included 19 variables computed by pipeline for all GCM/SSP combinations
Ensemble Products	Single GCM files only — manual aggregation required	✓ Included Ensemble mean, SD, minimum, and maximum across 10 GCMs
Quality Assurance	Not included in source dataset release	✓ Validated QC reports in `06_quality_control/` covering alignment, CRS, extent, resolution, and NA checks
Multicollinearity Assessment	Not provided	✓ Included VIF analysis for predictor variable screening
Model-Ready Products	Manual stacking and formatting required	✓ Pipeline provided Outputs are standard GeoTIFFs loadable in MaxEnt, biomod2, and other SDM platforms — see §6.4 for loading examples
Technical Documentation	Basic README file	✓ Comprehensive Interactive guide, detailed README, complete citation framework
Institutional Custodianship	🇦🇺 Australian institutions (CSIRO, UQ)	🇧🇹 Kingdom of Bhutan — Ministry of Energy and Natural Resources
Time to Analysis	Substantial manual preprocessing required from scratch	✓ All computation automated by pipeline scripts

Preprocessing Time Comparison

Without This Pipeline

Download and organize all monthly CMIP6 source files
Write code to compute 19 BIOCLIM variables per GCM/SSP/period
Validate spatial alignment & CRS consistency
Construct multi-model ensemble statistics
Perform multicollinearity diagnostics (VIF)
Prepare model-ready raster stacks
Substantial preprocessing effort required before analysis can begin

Using BhutanBioClims v1.0 Pipeline

Download source data from CSIRO (Dorji et al., 2025)
Place monthly rasters in 01_raw_cmip6_data/01_cmip6_monthly/ preserving the GCM/period/SSP folder structure
Run: Rscript 10_scripts/r/bioclim_master.R --input_root ./01_raw_cmip6_data/01_cmip6_monthly --output_root ./03_bioclim_variables/01_bioclim_by_gcm
✓ BIO1–BIO19 computed automatically for all GCMs, SSPs, and time periods
✓ Run Rscript 10_scripts/r/build_ensembles.R to generate ensemble products (mean, SD, min, max)
✓ Run Rscript 10_scripts/r/qc_validation.R for spatial alignment and CRS checks
✓ Run Rscript 10_scripts/r/multicollinearity_screening.R for VIF predictor screening
All steps fully scripted — no preprocessing code to write

Primary Beneficiaries

Graduate Researchers

Use the ready-to-run pipeline to process CSIRO source data without writing any preprocessing code. Focus on ecological interpretation rather than computational infrastructure.

Conservation Organizations

Execute rapid climate vulnerability assessments for protected area networks. Compare SSP scenarios without specialized technical capacity.

Government & Policy

Generate policy-informing projections with quantified uncertainty (ensemble mean ± SD). Produce defensible reports for national adaptation planning.

International Research

Access reproducible workflow from raw CMIP6 to analysis-ready SDM layers. Cite both Dorji et al. (2025) for source data and Wangdi (2026) for this pipeline.

Getting Started — This repository shares code, not rasters

The CSIRO dataset provides raw climate variables (ingredients). BhutanBioClims v1.0 provides the R pipeline to compute analysis-ready bioclimatic products (finished output). To get started:

Download source climate grids from https://doi.org/10.25919/pec2-hs50
Place the monthly rasters in 01_raw_cmip6_data/01_cmip6_monthly/ (preserving the GCM/period/SSP folder structure)
Run the pipeline: Rscript 10_scripts/r/bioclim_master.R --input_root ./01_raw_cmip6_data/01_cmip6_monthly --output_root ./03_bioclim_variables/01_bioclim_by_gcm

Unit conversion and BIO1–BIO19 computation are handled by bioclim_master.R. Ensemble aggregation, QC validation, and multicollinearity screening each have dedicated scripts — all fully scripted, no code to write from scratch.

Dataset propertiesCore Metadata

Version	1.0.0	Last Updated	2026-04-23
Geographic Extent (WGS84)	26°42′N – 28°12′N, 88°42′E – 92°9′E (Bhutan National Boundary)
Spatial Extent (EPSG:5266)	xmin: 125,694.5 \| ymin: 2,954,438 \| xmax: 460,694.5 \| ymax: 3,125,938
CRS / EPSG	DRUKREF 03 / Bhutan National Grid (EPSG:5266)	Datum	D_DRUKREF_03
Pixel Size	250 m (approx. 9 arc-seconds)	Grid Alignment	Validated via `qc_validation.R`
Nodata Value	−9999	Units	°C (Temperature) · mm (Precipitation)
Temporal Coverage	1986 – 2100	Time Slices	Historical · 2021–50 · 2051–80 · 2071–2100
Scenarios	SSP1-2.6 · SSP2-4.5 · SSP3-7.0 · SSP5-8.5
GCMs (n=10)	ACCESS-CM2 · CNRM-CM6-1 · CNRM-ESM2-1 · INM-CM4-8 · INM-CM5-0 · MIROC-ES2L · MIROC6 · MPI-ESM1-2-LR · MRI-ESM2-0 · NorESM2-MM
Bias Correction	Applied by Dorji et al. (2025) — specific method documented in source publication
Downscaling Method	Statistical downscaling to 250 m per Dorji et al. (2025) — delta change method, bivariate thin plate splines
Ensemble Weighting	Equal weighting — unweighted arithmetic mean across 10 GCMs

Processing protocolMethods

3.1 Source Data & Preprocessing

The primary inputs are monthly mean temperature, minimum temperature, maximum temperature, and precipitation grids obtained from Dorji et al. (2025). Source data were generated via downscaling to a 250 m resolution. Monthly files are organized by GCM, SSP, and time slice in the 01_raw_cmip6_data directory.

3.2 Bias Correction

Source Data Processing

Bias correction was applied by Dorji et al. (2025) prior to distribution of the source dataset. The specific bias correction methodology is documented in the source publication. No additional bias correction was applied in this pipeline beyond what was provided in the source data.

3.3 Downscaling

The source data from Dorji et al. (2025) were downscaled to 250 m resolution using statistical downscaling methods. The source dataset provides monthly temperature (mean, minimum, maximum) and precipitation grids for Bhutan. Details of the downscaling technique are available in the source publication.

3.4 BIOCLIM Derivation

The 19 standard bioclimatic variables (BIO1–BIO19) were computed from monthly tasmin, tasmax, and pr rasters using the terra R package (Hijmans et al., 2023), following the standard definitions of O'Donnell & Ignizio (2012) and Nix (1986). These variables capture annual trends, seasonality, and extreme or limiting environmental factors.

Unit conversions are applied automatically by the pipeline: temperatures exceeding 150 (indicating Kelvin) are shifted by −273.15 to Celsius; precipitation values below 5 (indicating kg m⁻² s⁻¹) are multiplied by 2,628,000 to obtain mm month⁻¹. Quarterly variables (BIO8–BIO11, BIO16–BIO19) are computed using all 12 rolling 3-month windows via pixel-wise index operations — no approximation to fixed calendar quarters.

3.5 Ensemble Aggregation

Multi-model ensembles are calculated across the 10 GCMs. Aggregation uses equal (unweighted) weighting for all models, computing the arithmetic mean across all available GCMs for each time slice, scenario, and BIOCLIM variable.

Ensemble Mean: μ = (1/n) × Σ GCM_i

Standard Deviation: σ = √[(1/n) × Σ (GCM_i − μ)²]

3.6 Quality Control Acceptance Criteria

A file is only promoted to release status if it meets all criteria:

Alignment: 100% pixel-perfect alignment with the Bhutan National Mask — verified by qc_validation.R
CRS: Correct projection (EPSG:5266 — DRUKREF 03) embedded in metadata
Extent: xmin: 125,694.5 · ymin: 2,954,438 · xmax: 460,694.5 · ymax: 3,125,938 (EPSG:5266)
Completeness: No missing pixels within the Bhutan administrative boundary
Range Check: Values within physically plausible bounds (e.g., BIO01 between −30°C and +40°C)
Integrity: File integrity verified — see 06_quality_control/integrity_summary.json

Pipeline structureFolder & Variable Architecture

The repository follows a structured 10-stage pipeline to ensure data provenance and reproducibility.

Stage	Contents	Role in Modeling
`00_project_metadata`	Manifests, citations, and guides	Metadata and provenance
`01_raw_cmip6_data`	Source monthly rasters from CSIRO	Pipeline input (place downloaded data here)
`03_bioclim_variables`	Per-GCM BIOCLIM rasters	GCM variability analysis
`04_ensemble_products`	Ensemble mean, standard deviation, minimum, maximum — one subdirectory per statistic	Primary source for projections and uncertainty assessment
`05_multicollinearity_analysis`	VIF and correlation reports	Predictor screening
`06_quality_control`	Alignment and integrity reports	QC verification
`08_model_ready_layers`	Placeholder for SDM-ready predictor stacks after QA and VIF selection	Intended destination for curated SDM-ready outputs
`10_scripts`	R and PowerShell pipeline scripts	All processing automation

BIOCLIM Variable Dictionary (Key Variables)

For the full list, see variable_dictionary.csv.

Code	Variable Name	Units	Description
`BIO01`	Annual Mean Temperature	°C	Average of monthly mean temperatures
`BIO04`	Temperature Seasonality	unitless	Standard deviation of monthly mean temp × 100
`BIO12`	Annual Precipitation	mm	Sum of all monthly precipitation values
`BIO15`	Precipitation Seasonality	%	Coefficient of variation of monthly precipitation

Pipeline Scripts

Script	Location	Purpose
`bioclim_master.R`	`10_scripts/r/`	Main entry point — computes BIO1–BIO19 from monthly rasters and runs ensemble aggregation for all GCMs, SSPs, and time periods
`build_ensembles.R`	`10_scripts/r/`	Reads per-GCM outputs from `03_bioclim_variables/01_bioclim_by_gcm` and writes ensemble mean, standard deviation, minimum, and maximum to `04_ensemble_products/`
`multicollinearity_screening.R`	`10_scripts/r/`	Pearson correlation matrix and iterative VIF analysis on ensemble mean layers for predictor screening; reads from `04_ensemble_products/ensemble_mean/`
`qc_validation.R`	`10_scripts/r/`	Spatial alignment, CRS, extent, resolution, and NA-proportion checks across all output rasters in `03_bioclim_variables/01_bioclim_by_gcm`

Running the Pipeline

The main entry point is bioclim_master.R. It auto-detects all GCMs, time periods, and SSP scenarios from the input directory structure — no configuration file needed.

# Minimal — auto-detect all models, periods, and scenarios
Rscript 10_scripts/r/bioclim_master.R \
  --input_root ./01_raw_cmip6_data/01_cmip6_monthly \
  --output_root ./03_bioclim_variables/01_bioclim_by_gcm

# Selective run — specify models and scenarios, tune memory
Rscript 10_scripts/r/bioclim_master.R \
  --input_root ./01_raw_cmip6_data/01_cmip6_monthly \
  --output_root ./03_bioclim_variables/01_bioclim_by_gcm \
  --models "ACCES_CM2,MIROC6,NorESM2-MM" \
  --scenarios "SSP245,SSP585" \
  --overwrite TRUE \
  --memfrac 0.5 \
  --tempdir "D:/terra_scratch"

# Run multicollinearity screening after BIO computation
Rscript 10_scripts/r/multicollinearity_screening.R \
  --project_root . \
  --time_slice "1986_2015" \
  --scenario "historical"

Argument	Default	Description
`--input_root`	required	Root directory containing CMIP6 monthly rasters (the `01_raw_cmip6_data/01_cmip6_monthly` folder)
`--output_root`	required	Destination root for BIO1–BIO19 GeoTIFF outputs
`--models`	auto-detect	Comma-separated GCM folder names (e.g., `ACCES_CM2,MIROC6`)
`--periods`	auto-detect	Comma-separated time periods (e.g., `1986-2015,2021-2050`)
`--scenarios`	auto-detect	Comma-separated SSP codes (e.g., `SSP245,SSP585`)
`--ensemble_stats`	`mean,median,min,max,sd`	Statistics to compute for multi-model ensemble
`--skip_ensemble`	`FALSE`	Skip ensemble step (use when running per-GCM outputs only)
`--overwrite`	`FALSE`	Overwrite existing output files
`--memfrac`	`0.3`	Fraction of RAM allocated to `terra`; increase on high-RAM systems
`--tempdir`	system temp	Scratch directory for `terra` disk-backed operations; redirect to a fast disk if possible
`--log_root`	`<output_root>/_logs`	Directory for timestamped run logs and summary CSV

Output Filename Convention & GeoTIFF Specifications

All output rasters use a standardized naming pattern that encodes model, scenario, period, and variable:

bhutan_cmip6_{model}_{scenario}_{period}_bio{nn}_v1_0.tif

Example: bhutan_cmip6_access_cm2_ssp245_2021_2050_bio01_v1_0.tif

GeoTIFF Specification	Value
Compression	DEFLATE · Predictor = 2 (horizontal differencing)
Tiling	YES — optimized for spatial subsetting without full file read
No-data flag	−9999
CRS	EPSG:5266 — DRUKREF 03 / Bhutan National Grid
Pixel size	250 m
Memory strategy	Disk-backed `terra` operations; never loads the full dataset into RAM

Ensemble interpretationUncertainty & Interpretation

Ensemble Products Explained

Ensemble Mean (ensemble_mean): Represents the consensus trajectory. Recommended for initial model runs and general assessments.

Standard Deviation (ensemble_standard_deviation): Quantifies GCM disagreement. Areas with high SD (e.g., high-altitude precipitation) should be interpreted with caution.

Reporting Uncertainty: When publishing SDM results, report the mean prediction alongside maps showing the range (Min/Max) or uncertainty (SD) to avoid overconfident projections.

Resolution Caution

Do not overinterpret the 250 m resolution. While the grid is fine-scaled, the underlying CMIP6 signals originate from much coarser GCMs. The local detail is a product of downscaling assumptions and should not be mistaken for native GCM resolution.

Recommended workflowModeling Best Practices

6.1 Multicollinearity Screening

To avoid overfitting, screen predictors before training models:

Consult 05_multicollinearity_analysis/vif_results.csv
VIF < 5: Preferred for high-rigor models
VIF < 10: Acceptable but requires justification
Always prioritize ecological relevance over statistical correlation alone

6.2 Spatial Autocorrelation

High-resolution grids often exhibit strong spatial autocorrelation. We recommend:

Spatial Thinning: Ensure occurrence points are not clustered within a single 250 m cell
Block Cross-Validation: Use spatial blocks for model validation rather than random k-fold

6.3 Scenario Reporting Template

When reporting results, use a template such as:

"Predictions were generated for 2051–2080 under the SSP2-4.5 scenario, representing an intermediate emissions trajectory using a 10-GCM ensemble mean (Bhutan CMIP6 BIOCLIM v1.0)."

6.4 Loading Pipeline Outputs in R

All pipeline outputs are standard compressed GeoTIFFs loadable with any R spatial library. Use terra (recommended, matches the pipeline) or sf/stars as needed.

library(terra)

# ── Single ensemble mean raster (historical BIO01) ───────────────────
# Filename pattern: bhutan_cmip6_ensemble_{stat}_{ssp}_{period}_{bio}_v1_0.tif
bio01_hist <- rast(
  "04_ensemble_products/ensemble_mean/1986_2015/historical/bhutan_cmip6_ensemble_mean_historical_1986_2015_bio01_v1_0.tif"
)

# ── Full 19-variable ensemble mean stack for a future scenario ────────
ssp245_2050 <- rast(
  list.files(
    "04_ensemble_products/ensemble_mean/2021_2050/ssp245/",
    pattern = "^bhutan_cmip6_ensemble_mean.*\\.tif$",
    full.names = TRUE
  )
)

# ── Ensemble standard deviation (uncertainty map) ─────────────────────
bio01_sd <- rast(
  "04_ensemble_products/ensemble_standard_deviation/2021_2050/ssp245/bhutan_cmip6_ensemble_standard_deviation_ssp245_2021_2050_bio01_v1_0.tif"
)

# ── VIF-screened predictor set for SDM ───────────────────────────────
# Consult 05_multicollinearity_analysis/selected_predictors.txt for retained variables.
# Load only those from the ensemble mean directory:
selected_vars <- readLines("05_multicollinearity_analysis/selected_predictors.txt")
mean_files <- list.files(
  "04_ensemble_products/ensemble_mean/1986_2015/historical/",
  pattern = "\\.tif$", full.names = TRUE
)
sdm_predictors <- rast(
  mean_files[sapply(mean_files, function(f)
    any(sapply(selected_vars, function(v) grepl(v, basename(f), fixed = TRUE)))
  )]
)

# ── Stack individual GCM outputs to quantify inter-model spread ───────
# Per-GCM file pattern: bhutan_cmip6_{model}_{scenario}_{period}_{bio}_v1_0.tif
gcm_bio01 <- rast(
  list.files(
    "03_bioclim_variables/01_bioclim_by_gcm/",
    pattern = "bio01_v1_0\\.tif$",
    recursive = TRUE, full.names = TRUE
  )
)
bio01_range <- max(gcm_bio01) - min(gcm_bio01)  # pixel-wise inter-model range

# ── Extract values at species occurrence points ───────────────────────
# occ_pts <- vect("occurrences.gpkg")
# climate_vals <- extract(sdm_predictors, occ_pts)

Consult the VIF results before stacking predictors

Check 05_multicollinearity_analysis/vif_results.csv and use only the retained variable set listed there. Loading all 19 variables without screening will introduce collinearity into your SDM.

Known constraintsLimitations

Understanding Constraints

Scenario Uncertainty: Future trajectories are based on socioeconomic assumptions that may change
GCM Structural Uncertainty: Different models simulate climate physics differently (partially addressed by our 10-model ensemble)
Downscaling Assumptions: Methods assume historical climate relationships (e.g., lapse rates) remain constant in the future
Temporal Mismatch: 30-year averages may mask short-term climate extremes or decadal variability
Pseudo-precision: High resolution (250 m) does not imply high accuracy in complex terrain where station data is sparse

Scope of applicationIntended & Not Intended Use

Intended Use

Regional and national species distribution modeling (SDM)
Climate change vulnerability and impact assessments
Strategic conservation planning and protected area gap analysis
Ecological niche modeling and habitat suitability studies
Academic research and institutional decision support

Not Intended Use

Local-scale engineering or infrastructure design (e.g., dam building)
Short-term weather forecasting
Analysis of individual extreme weather events (e.g., specific GLOF events)
Site-specific agricultural planning without local validation
Legal or regulatory compliance without expert consultation

How to citeCitation & Attribution

Primary Source — Required Citation (APA 7)

Dorji, S., Stewart, S., Bajwa, A., Aziz, A., Shabbir, A., & Adkins, S. (2025). High-resolution (250 m) historical and projected (CMIP6) air temperature and precipitation grids for Bhutan (v1) [Data set]. CSIRO. https://doi.org/10.25919/pec2-hs50

Pipeline Citation

Wangdi (2026). BhutanBioClims: CMIP6 Bioclimatic Variable Pipeline for Bhutan (v1.0). Forest Resources Planning and Management Division, Department of Forests and Park Services, Ministry of Energy and Natural Resources, Royal Government of Bhutan, Thimphu, Bhutan. GitHub repository

Credit & Source Development Team

The source CSIRO dataset was developed by Sangay Dorji (University of Queensland) with advice and technical guidance from:

Stephen Stewart — CSIRO Environment, Natural Capital Assessment
Ali Bajwa — La Trobe University
Asad Shabbir — NSW DPI
Ammar Aziz — University of Queensland
Steve Adkins — University of Queensland

CMIP6 Acknowledgment

"We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output."

How to Cite This Protocol

Wangdi (2026). Technical Guide & Modeling Protocol
for the Bhutan CMIP6 BIOCLIM Infrastructure (v1.0). Thimphu, Bhutan.

Key References

Booth et al. (2014). bioclim: the first species distribution modelling package. Diversity and Distributions, 20(1), 1–9. doi:10.1111/ddi.12144
O'Donnell, M.S. & Ignizio, D.A. (2012). Bioclimatic predictors for supporting ecological applications in the conterminous United States. U.S. Geological Survey Data Series 691.
Nix, H.A. (1986). A biogeographic analysis of Australian elapid snakes. In: Longmore, R. (ed.), Atlas of Elapid Snakes of Australia, pp. 4–15. Australian Government Publishing Service, Canberra.
Dorji et al. (2025). Comparative Analysis of Mechanistic and Correlative Models for Bhutan. Plants, 14(1). doi:10.3390/plants14010083
Hijmans, R.J. et al. (2023). terra: Spatial Data Analysis. R package v1.7. CRAN
O'Neill et al. (2016). The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6. Geosci. Model Dev., 9(9). doi:10.5194/gmd-9-3461-2016
Riahi et al. (2017). The Shared Socioeconomic Pathways. Global Environmental Change, 42:153–168.
Stewart et al. (2017). Topography and the north Indian monsoon on climate interpolation in Bhutan. Int. J. Climatology, 37(S1). doi:10.1002/joc.5045
Stewart et al. (2021). Interpolated climate variables for Bhutan [Raster]. CSIRO. doi:10.25919/m8yh-gt42