An R package for dataset comparison and reconciliation for SDTM/ADAM data frames and Others .
Compare two data frames BASE (reference) vs COMPARE (new version) and get a detailed, structured report of every structural and value-level difference. Designed for clinical data programming, ETL validation, and general data-quality workflows.
# From CRAN (once published)
install.packages("rDataRecon")
# Stable version:
remotes::install_github("ganeshbabunn/rDataRecon")
# Development version:
remotes::install_github("ganeshbabunn/rDataRecon@dev")
# From local source
install.packages("../rDataRecon_X.X.tar.gz", repos = NULL, type = "source")
X indicates the version of the release.library(rDataRecon)
base_df <- data.frame(
id = 1:5,
age = c(25L, 30L, 35L, 40L, 45L),
name = c("Sarang", "Ganesh", "Swetha", "Jyothi", "Rudresh"),
stringsAsFactors = FALSE
)
comp_df <- data.frame(
id = 1:5,
age = c(25L, 31L, 35L, 40L, 45L),
name = c("Sarang", "ganesh", "Swetha", "Jyothi", "Rudresh"),
stringsAsFactors = FALSE
)
result <- recon(base_df, comp_df, id = "id")Extract differences as a tidy data frame:
get_diffs(result)
# obs_id variable base_value compare_value diff
# 1 id = 2 age 30 31 1
# 2 id = 2 name Ganesh ganesh NA| Feature | Details |
|---|---|
| Zero dependencies | Base R only no additional packages required |
| Key-based matching | Match rows by one or more ID columns |
| Positional matching | Row-by-row when no ID is specified |
| Cross-variable mapping | Compare differently-named columns via base_var / compare_var |
| Absolute tolerance | criterion = 0.01 ignore differences below threshold |
| Relative tolerance | criterion = 0.05, relative = TRUE proportional comparison |
| Focused comparison | var = c("age","sbp") compare specific variables only |
| Report export | out = "report.txt" save full report to file |
| S3 result object | Class rDataRecon_result with print(), get_diffs(), summary_df() |
recon() argument |
Behaviour |
|---|---|
base |
Reference dataset |
compare |
Dataset to validate against BASE |
id |
Key column(s) for row matching |
var |
Restrict comparison to named variables |
base_var + compare_var |
Map differently-named columns across datasets |
criterion |
Numeric equality tolerance (default 1e-8) |
relative = TRUE |
Apply tolerance proportionally |
noequal = TRUE |
Suppress equal-variable output |
listall = TRUE |
Print all variable names in header |
brief = TRUE |
Summary section only |
out = "file.txt" |
Write report to text file |
base2 <- data.frame(id = 1:3, revenue = c(100, 200, 300))
comp2 <- data.frame(id = 1:3, sales = c(100, 250, 300))
recon(base2, comp2, id = "id", base_var = "revenue", compare_var = "sales")recon(base_df, comp_df, id = "id", criterion = 0.05, relative = TRUE)recon(base_df, comp_df, id = c("site", "visit"))recon(base_df, comp_df, id = "id", out = "rDataRecon_report.txt")r <- recon(base_df, comp_df, id = "id", brief = TRUE)
summary_df(r) # one-row log entry
compare_stats(base_df, comp_df) # side-by-side numeric stats
is_structure_equal(base_df, comp_df) # structural checkrecon() prints a four-section plain-text report:
========================================================================
rDataRecon - Dataset Comparison Report
========================================================================
BASE : base_df 5 obs x 3 vars
COMPARE : comp_df 5 obs x 3 vars
Criterion : 1e-08 (absolute)
Run time : 2025-01-01 12:00:00
------------------------------------------------------------------------
SECTION 1 - Variable Structure
SECTION 2 - Variable Type Mismatches
SECTION 3 - Value Comparison
SECTION 4 - Summary Statistics
========================================================================
VERDICT : DATASETS HAVE DIFFERENCES
========================================================================
recon() returns an "rDataRecon_result" list invisibly:
$summary list of counts (obs, vars, diffs)
$vars_base_only character vector
$vars_compare_only character vector
$type_mismatches data.frame (variable, base_class, compare_class)
$value_diffs data.frame (obs_id, variable, base_value,
compare_value, diff)
$verdict character string
Found a bug or have an idea to improve rDataRecon?
Please raise it via the GitHub Issues tab.
When opening a new issue, use the appropriate label:
| Type | Label | When to use |
|---|---|---|
| Bug | bug |
Something is broken or behaving unexpectedly |
| Feature | enhancement |
A new feature or improvement you'd like to see |
- Go to the Issues tab
- Click "New Issue"
- Choose the relevant template — Bug Report or Feature Request
- Fill in the details and apply the correct label (
bugorenhancement) - Click "Submit new issue"
Tip: Before opening a new issue, please search existing issues to avoid duplicates.
GPL-3.0 Copyright (C) 2026 Ganesh Babu G