BORG

cross-validation for data that isn't independent

Cross-validation that respects spatial, temporal, and grouped structure, with the dependency detected for you.

Hand BORG your data and the columns that carry structure (coordinates, a time stamp, a grouping ID). It tests for spatial autocorrelation (Moran's I), temporal autocorrelation (ACF / Ljung-Box), and clustering (ICC), then builds folds that hold the structure out instead of splitting through it. When a dependency is present, random k-fold is disabled unless you ask for it explicitly. The point is the number you report from CV being the number you get in the field.

library(BORG)

spatial_data <- data.frame(
  lon = runif(200, -10, 10),
  lat = runif(200, -10, 10),
  response = rnorm(200)
)

# diagnose the dependency and get folds that respect it
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
result$diagnosis@recommended_cv
#> "spatial_block"

Diagnose first, then enforce

Spatial CV tools such as blockCV, CAST, and spatialsample give you the blocking schemes but leave the decision to you: you have to know the data are autocorrelated, pick a block size, and choose to use it. BORG runs the diagnosis first. If it finds structure, it estimates how much random CV would inflate your metric, refuses random folds, and generates a scheme matched to the dependency it found. Overriding is one explicit argument away, and the override is logged.

borg_diagnose(spatial_data, coords = c("lon", "lat"), target = "response")
#> Dependency:  SPATIAL (moderate severity)
#> Moran's I:   0.18 (p < 0.001)
#> Strategy:    spatial_block

Cross-validation schemes

borg_cv() covers the schemes used in dependent-data evaluation:

Spatial: spatial blocking, environmental blocking, checkerboard, hexagonal, leave-location-out, spatial-plus (buffered), and KNNDM matched to prediction points.
Temporal: chronological splits with an embargo, expanding and sliding windows, and de Prado's purged CV.
Grouped: leave-group-out, so no cluster appears in both train and test.
Mixed: spatial-temporal and spatial-group combinations.

Output is a plain list of train/test indices by default, or an rsample, caret, or mlr3 object for direct use in those frameworks.

# tidymodels-compatible folds
folds <- borg(spatial_data, coords = c("lon", "lat"), output = "rsample")

Show the inflation, not just the warning

borg_compare_cv() runs random and structure-aware CV on the same data and model, and reports the gap with a paired t-test. This is the evidence a reviewer asks for when you switch a paper from random to blocked CV.

comparison <- borg_compare_cv(
  spatial_data,
  formula = response ~ lon + lat,
  coords  = c("lon", "lat"),
  repeats = 10
)
print(comparison)
plot(comparison)

borg_power() answers the other half of that conversation: blocking costs you effective sample size, so it reports the design effect, the effective n, and the minimum effect you can still detect.

Leakage detection

Before structure, BORG checks the split itself. borg() in validation mode and borg_inspect() catch the leaks that quietly inflate a held-out score:

index overlap and duplicate rows across train and test (fast hash-based checks in C++),
preprocessing fitted on the full data (scale, prcomp, caret::preProcess, recipes steps),
target leakage from a feature near-perfectly correlated with the outcome,
group leakage and look-ahead in time.

# scaled before splitting: the test means leaked into the scaler
data_scaled <- scale(iris[, 1:4])
borg_inspect(data_scaled, train_idx = 1:100, test_idx = 101:150)
#> INVALID — Hard violation: preprocessing_leak

Hard violations (overlap, duplicates, preprocessing leak, target leak, group leak, look-ahead) invalidate the result and block it. Soft inflations (a near-threshold proxy feature, test points close to training) warn and let you proceed with the caveat recorded.

Area of applicability and prediction maps

borg_aoa() and borg_di() flag prediction locations that sit outside the feature space the model was trained on, following Meyer & Pebesma (2021). borg_extract(), borg_thin(), borg_predict_raster(), and borg_leaflet() cover the surrounding spatial workflow, from raster extraction to an interactive map of folds.

Methods text for the manuscript

summary() on a BORG result writes a methods paragraph with the test statistics filled in (Moran's I, ACF, ICC, and their p-values), in APA, Nature, or Ecology style. borg_certificate() and borg_export() write the same assessment as a machine-readable YAML or JSON record for an audit trail.

summary(result)
#> Model performance was evaluated using spatial block cross-validation
#> (k = 5 folds). Spatial autocorrelation was detected (Moran's I = 0.18,
#> p < 0.001) ...

Installation

install.packages("pak")           # development version
pak::pak("gcol33/BORG")

Documentation

Support

"Software is like sex: it's better when it's free." — Linus Torvalds

I'm a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.

If this package saved you some time, buying me a coffee is a nice way to say thanks. It helps with my coffee addiction.

License

MIT (see the LICENSE.md file)

Citation

@software{BORG,
  author = {Colling, Gilles},
  title = {BORG: Bounded Outcome Risk Guard for Model Evaluation},
  year = {2026},
  url = {https://github.com/gcol33/BORG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
R		R
docs		docs
inst		inst
man		man
pkgdown		pkgdown
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
BORG.Rproj		BORG.Rproj
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
ENFORCEMENT_SURFACE.md		ENFORCEMENT_SURFACE.md
EVOLUTION.md		EVOLUTION.md
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_cran_build.R		_cran_build.R
_cran_check.R		_cran_check.R
_cran_precheck.R		_cran_precheck.R
_cran_xplat.R		_cran_xplat.R
_pkgdown.yml		_pkgdown.yml
_regen_docs.R		_regen_docs.R
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BORG

Diagnose first, then enforce

Cross-validation schemes

Show the inflation, not just the warning

Leakage detection

Area of applicability and prediction maps

Methods text for the manuscript

Installation

Documentation

Support

License

Citation

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BORG

Diagnose first, then enforce

Cross-validation schemes

Show the inflation, not just the warning

Leakage detection

Area of applicability and prediction maps

Methods text for the manuscript

Installation

Documentation

Support

License

Citation

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages