rDataRecon

An R package for dataset comparison and reconciliation for SDTM/ADAM data frames and Others .

Compare two data frames BASE (reference) vs COMPARE (new version) and get a detailed, structured report of every structural and value-level difference. Designed for clinical data programming, ETL validation, and general data-quality workflows.

Installation

# From CRAN (once published)
install.packages("rDataRecon")

# Stable version:
remotes::install_github("ganeshbabunn/rDataRecon")

# Development version:
remotes::install_github("ganeshbabunn/rDataRecon@dev")

# From local source
install.packages("../rDataRecon_X.X.tar.gz", repos = NULL, type = "source")
X indicates the version of the release.

Quick start

library(rDataRecon)

base_df <- data.frame(
  id   = 1:5,
  age  = c(25L, 30L, 35L, 40L, 45L),
  name = c("Sarang", "Ganesh", "Swetha", "Jyothi", "Rudresh"),
  stringsAsFactors = FALSE
)

comp_df <- data.frame(
  id   = 1:5,
  age  = c(25L, 31L, 35L, 40L, 45L),
  name = c("Sarang", "ganesh", "Swetha", "Jyothi", "Rudresh"),
  stringsAsFactors = FALSE
)

result <- recon(base_df, comp_df, id = "id")

Extract differences as a tidy data frame:

get_diffs(result)
#   obs_id variable base_value compare_value  diff
# 1  id = 2      age         30            31     1
# 2  id = 2     name        Ganesh         ganesh  NA

Features

Feature	Details
Zero dependencies	Base R only no additional packages required
Key-based matching	Match rows by one or more ID columns
Positional matching	Row-by-row when no ID is specified
Cross-variable mapping	Compare differently-named columns via `base_var` / `compare_var`
Absolute tolerance	`criterion = 0.01` ignore differences below threshold
Relative tolerance	`criterion = 0.05, relative = TRUE` proportional comparison
Focused comparison	`var = c("age","sbp")` compare specific variables only
Report export	`out = "report.txt"` save full report to file
S3 result object	Class `rDataRecon_result` with `print()`, `get_diffs()`, `summary_df()`

Argument reference

`recon()` argument	Behaviour
`base`	Reference dataset
`compare`	Dataset to validate against BASE
`id`	Key column(s) for row matching
`var`	Restrict comparison to named variables
`base_var` + `compare_var`	Map differently-named columns across datasets
`criterion`	Numeric equality tolerance (default `1e-8`)
`relative = TRUE`	Apply tolerance proportionally
`noequal = TRUE`	Suppress equal-variable output
`listall = TRUE`	Print all variable names in header
`brief = TRUE`	Summary section only
`out = "file.txt"`	Write report to text file

Examples

Cross-variable name mapping

base2 <- data.frame(id = 1:3, revenue = c(100, 200, 300))
comp2 <- data.frame(id = 1:3, sales   = c(100, 250, 300))
recon(base2, comp2, id = "id", base_var = "revenue", compare_var = "sales")

Relative tolerance (5 %)

recon(base_df, comp_df, id = "id", criterion = 0.05, relative = TRUE)

Multi-column key

recon(base_df, comp_df, id = c("site", "visit"))

Save report to file

recon(base_df, comp_df, id = "id", out = "rDataRecon_report.txt")

Programmatic use in QC pipelines

r <- recon(base_df, comp_df, id = "id", brief = TRUE)

summary_df(r)          # one-row log entry
compare_stats(base_df, comp_df)   # side-by-side numeric stats
is_structure_equal(base_df, comp_df)  # structural check

Report structure

recon() prints a four-section plain-text report:

========================================================================
               rDataRecon  -  Dataset Comparison Report
========================================================================
  BASE    : base_df              5 obs  x  3 vars
  COMPARE : comp_df              5 obs  x  3 vars
  Criterion : 1e-08  (absolute)
  Run time  : 2025-01-01 12:00:00
------------------------------------------------------------------------
  SECTION 1  -  Variable Structure
  SECTION 2  -  Variable Type Mismatches
  SECTION 3  -  Value Comparison
  SECTION 4  -  Summary Statistics
========================================================================
  VERDICT :  DATASETS HAVE DIFFERENCES
========================================================================

Return value

recon() returns an "rDataRecon_result" list invisibly:

$summary             list of counts (obs, vars, diffs)
$vars_base_only      character vector
$vars_compare_only   character vector
$type_mismatches     data.frame (variable, base_class, compare_class)
$value_diffs         data.frame (obs_id, variable, base_value,
                                  compare_value, diff)
$verdict             character string

Bug Reports & Feature Requests

Found a bug or have an idea to improve rDataRecon? Please raise it via the GitHub Issues tab.

When opening a new issue, use the appropriate label:

Type	Label	When to use
Bug	`bug`	Something is broken or behaving unexpectedly
Feature	`enhancement`	A new feature or improvement you'd like to see

Steps to raise an issue

Go to the Issues tab
Click "New Issue"
Choose the relevant template — Bug Report or Feature Request
Fill in the details and apply the correct label (bug or enhancement)
Click "Submit new issue"

Tip: Before opening a new issue, please search existing issues to avoid duplicates.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md
rDataRecon.Rproj		rDataRecon.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rDataRecon

Installation

Quick start

Features

Argument reference

Examples

Cross-variable name mapping

Relative tolerance (5 %)

Multi-column key

Save report to file

Programmatic use in QC pipelines

Report structure

Return value

Bug Reports & Feature Requests

Steps to raise an issue

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rDataRecon

Installation

Quick start

Features

Argument reference

Examples

Cross-variable name mapping

Relative tolerance (5 %)

Multi-column key

Save report to file

Programmatic use in QC pipelines

Report structure

Return value

Bug Reports & Feature Requests

Steps to raise an issue

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages