add AMALIA algorithm for lead isotope provenance matching by Abagna123 · Pull Request #51 · archaeothommy/ASTR

Abagna123 · 2026-06-14T15:33:17Z

Hi @archaeothommy, I replaced the nested loops with expand.grid and mapply. The function is working and all tests pass (I haven't pushed the test function yet). Since the Argentina database has no measurement errors, I derived absolute 2SD uncertainties for the reference data using the SRM-981 interlaboratory ranges from the paper (0.10%, 0.14%, 0.20%, 0.13%, 0.20%) multiplied by each ratio value.

archaeothommy

The code looks fine overall, thank you! The idea of using expand.grid and mapply is great!

I think it the function could be quite significantly simplified, as explained in the comments.

archaeothommy · 2026-06-14T22:23:57Z

+#'   "207Pb/206Pb_err2SD", "208Pb/206Pb_err2SD")`.
+#' @param id_sample String with the column name of the sample IDs in `df`.
+#'   Default is `"ID"`.
+#' @param id_ref String with the column name of the reference groups in `ref`.


This would be better referred to as "group" rather than individual IDs. That's how it must be handled at least (this is admittedly not well documented in the pointcloud-function). We cannot expect that reference data have individual IDs. A table with e.g. the regions as only information except the isotope ratios and analytical values must be sufficient.

archaeothommy · 2026-06-14T22:30:08Z

+      diff_1 = abs(as.numeric(df[matched$i, iso[1]]) -
+                     as.numeric(ref[matched$j, iso[1]])),
+      diff_2 = abs(as.numeric(df[matched$i, iso[2]]) -
+                     as.numeric(ref[matched$j, iso[2]])),
+      diff_3 = abs(as.numeric(df[matched$i, iso[3]]) -
+                     as.numeric(ref[matched$j, iso[3]])),


Is this necessary? I thought it is a TRUE/FALSE type of matching. If it is required, why don't you recycle the values calculated in lines 142 and 143?

archaeothommy · 2026-06-14T22:30:47Z

+#' @param ratios_204 Character vector of length 3 with column names of the
+#'   204Pb-normalised isotope ratios. Must be identical in `df` and `ref`.
+#'   Default to `c("206Pb/204Pb", "207Pb/204Pb", "208Pb/204Pb")`.
+#' @param ratios_206 Character vector of length 3 with column names of the
+#'   206Pb-normalised isotope ratios. Must be identical in `df` and `ref`. Only
+#'   used when `triplet = "206"` or `triplet = "both"`. Default to
+#'   `c("206Pb/204Pb", "207Pb/206Pb", "208Pb/206Pb")`.
+#' @param error_204 Character vector of length 3 with column names of the
+#'   analytical uncertainty in 2SD for the 204Pb-normalised ratios. Must be
+#'   identical in both `df` and `ref`. Default to `c("206Pb/204Pb_err2SD",
+#'   "207Pb/204Pb_err2SD", "208Pb/204Pb_err2SD")`.
+#' @param error_206 Character vector of length 3 with column names of the
+#'   analytical uncertainty (2SD) for the 206Pb-normalised ratios. Must be
+#'   identical in both `df` and `ref`. Only used when `triplet = "206"` or
+#'   `triplet = "both"`. Default to `c("206Pb/204Pb_err2SD",
+#'   "207Pb/206Pb_err2SD", "208Pb/206Pb_err2SD")`.


Make sure that the function requires only the input of the triplet that should be used. I think this applies only to the checks. The input for the other triplet should just be ignored.
It is probably best to make this check based on the value of triplet. I did not saw a check that the column names and the value of triplet are matching.

archaeothommy · 2026-06-14T22:40:19Z

+  if (triplet == "204Pb" || triplet == "both") {
+    matches_204 <- run_amalia(ratios_204, error_204)
+  }
+
+  if (triplet == "206Pb" || triplet == "both") {
+    matches_206 <- run_amalia(ratios_206, error_206)
+  }
+
+  # Combine results
+  if (triplet == "204Pb") {
+    matches <- matches_204
+  }
+
+  if (triplet == "206Pb") {
+    matches <- matches_206
+  }
+
+  if (triplet == "both") {
+    if (nrow(matches_204) == 0 || nrow(matches_206) == 0) {
+      matches <- data.frame()
+    } else {
+      pair_204 <- paste(matches_204$sample_id, matches_204$ref_id)
+      pair_206 <- paste(matches_206$sample_id, matches_206$ref_id)
+      # Strict intersection — must pass in both triplets
+      keep_204 <- pair_204 %in% pair_206
+      keep_206 <- pair_206 %in% pair_204
+      # Combine diff columns from both triplets for matched pairs
+      matches <- cbind(
+        matches_204[keep_204, ],
+        matches_206[keep_206, 3:5]
+      )
+    }
+  }


I think this all can be simplified by having the matching running over all isotope ratios at once if triplet = "both". The algorithm returns FALSE when one value in a triplet doesn't match. Consequently, running the matching of all six ratios at the same time will give the same result as running the matching for each triplet separately and figuring out which matches both triples have in common.
Implementing this would require to use triplet only for determining which isotope ratios should be supplied to run_amalia() and all output of this function can then be handled the same.

archaeothommy · 2026-06-14T22:43:04Z

+    stop("Reference ID column '", id_ref, "' not found in reference data.")
+  }
+
+  run_amalia <- function(iso, err) {


Please avoid undocumented function nesting. See e.g. the ASTR parsing functions how they should be arranged and documented.

archaeothommy · 2026-06-14T22:55:53Z

+  if (nrow(matches) > 0) {
+    n_matches <- as.data.frame(table(matches$sample_id))
+    names(n_matches) <- c("sample_id", "n_matches")
+    missing <- setdiff(all_ids, n_matches$sample_id)
+    if (length(missing) > 0) {
+      n_matches <- rbind(
+        n_matches,
+        data.frame(sample_id = missing, n_matches = 0L)
+      )
+    }
+    n_matches <- n_matches[match(all_ids, n_matches$sample_id), ]
+  } else {
+    n_matches <- data.frame(
+      sample_id = all_ids,
+      n_matches = 0L
+    )
+  }


It should be possible to simplify this with ifelse(), probably something similar to:

all_ids$n_matches <- ifelse(all_ids `%in%` matches$sample_id, <get summarised values from table() call>, 0L)

archaeothommy · 2026-06-14T22:57:22Z

+
+  if (length(unmatched) > 0) {
+    message(
+      length(unmatched), " sample(s) had no matches: ",


Suggested change

length(unmatched), " sample(s) had no matches: ",

length(unmatched), " sample(s) without matches: ",

archaeothommy · 2026-06-14T22:58:41Z

+#' df <- data.frame(
+#'   ID = c("Art1", "Art2", "Art3"),
+#'   `206Pb/204Pb` = c(18.244, 18.419, 18.050),
+#'   `207Pb/204Pb` = c(15.634, 15.658, 15.620),
+#'   `208Pb/204Pb` = c(38.407, 38.638, 38.157),
+#'   `206Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001),
+#'   `207Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001),
+#'   `208Pb/204Pb_err2SD` = c(0.002, 0.002, 0.002),
+#'   `207Pb/206Pb` = c(0.857, 0.850, 0.865),
+#'   `208Pb/206Pb` = c(2.105, 2.098, 2.114),
+#'   `207Pb/206Pb_err2SD` = c(0.00001, 0.00001, 0.00001),
+#'   `208Pb/206Pb_err2SD` = c(0.00004, 0.00004, 0.00004),
+#'   check.names = FALSE
+#' )
+#'
+#' ref <- data.frame(
+#'   ID = c("Ore_A", "Ore_B", "Ore_C"),
+#'   `206Pb/204Pb` = c(18.242, 18.500, 18.048),
+#'   `207Pb/204Pb` = c(15.633, 15.700, 15.619),
+#'   `208Pb/204Pb` = c(38.405, 38.800, 38.155),
+#'   `206Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001),
+#'   `207Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001),
+#'   `208Pb/204Pb_err2SD` = c(0.002, 0.002, 0.002),
+#'   `207Pb/206Pb` = c(0.857, 0.850, 0.865),
+#'   `208Pb/206Pb` = c(2.105, 2.098, 2.114),
+#'   `207Pb/206Pb_err2SD` = c(0.00001, 0.00001, 0.00001),
+#'   `208Pb/206Pb_err2SD` = c(0.00004, 0.00004, 0.00004),
+#'   check.names = FALSE
+#' )


Columns for 204Pb/206Pb and respective analytical uncertainties are missing.

add AMALIA algorithm for lead isotope provenance matching

a41fe44

Abagna123 requested a review from archaeothommy June 14, 2026 15:33

Abagna123 assigned archaeothommy Jun 14, 2026

update documentation and code styling.

9866378

archaeothommy requested changes Jun 14, 2026

View reviewed changes

streamline AMALIA with single match loop and cleaned params

a556562

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add AMALIA algorithm for lead isotope provenance matching#51

add AMALIA algorithm for lead isotope provenance matching#51
Abagna123 wants to merge 3 commits into
mainfrom
amalia

Abagna123 commented Jun 14, 2026

Uh oh!

archaeothommy left a comment •

edited

Loading

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

archaeothommy Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	length(unmatched), " sample(s) had no matches: ",
	length(unmatched), " sample(s) without matches: ",

Conversation

Abagna123 commented Jun 14, 2026

Uh oh!

archaeothommy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

archaeothommy left a comment •

edited

Loading