Conversation
| #' "207Pb/206Pb_err2SD", "208Pb/206Pb_err2SD")`. | ||
| #' @param id_sample String with the column name of the sample IDs in `df`. | ||
| #' Default is `"ID"`. | ||
| #' @param id_ref String with the column name of the reference groups in `ref`. |
There was a problem hiding this comment.
This would be better referred to as "group" rather than individual IDs. That's how it must be handled at least (this is admittedly not well documented in the pointcloud-function). We cannot expect that reference data have individual IDs. A table with e.g. the regions as only information except the isotope ratios and analytical values must be sufficient.
| diff_1 = abs(as.numeric(df[matched$i, iso[1]]) - | ||
| as.numeric(ref[matched$j, iso[1]])), | ||
| diff_2 = abs(as.numeric(df[matched$i, iso[2]]) - | ||
| as.numeric(ref[matched$j, iso[2]])), | ||
| diff_3 = abs(as.numeric(df[matched$i, iso[3]]) - | ||
| as.numeric(ref[matched$j, iso[3]])), |
There was a problem hiding this comment.
Is this necessary? I thought it is a TRUE/FALSE type of matching. If it is required, why don't you recycle the values calculated in lines 142 and 143?
| #' @param ratios_204 Character vector of length 3 with column names of the | ||
| #' 204Pb-normalised isotope ratios. Must be identical in `df` and `ref`. | ||
| #' Default to `c("206Pb/204Pb", "207Pb/204Pb", "208Pb/204Pb")`. | ||
| #' @param ratios_206 Character vector of length 3 with column names of the | ||
| #' 206Pb-normalised isotope ratios. Must be identical in `df` and `ref`. Only | ||
| #' used when `triplet = "206"` or `triplet = "both"`. Default to | ||
| #' `c("206Pb/204Pb", "207Pb/206Pb", "208Pb/206Pb")`. | ||
| #' @param error_204 Character vector of length 3 with column names of the | ||
| #' analytical uncertainty in 2SD for the 204Pb-normalised ratios. Must be | ||
| #' identical in both `df` and `ref`. Default to `c("206Pb/204Pb_err2SD", | ||
| #' "207Pb/204Pb_err2SD", "208Pb/204Pb_err2SD")`. | ||
| #' @param error_206 Character vector of length 3 with column names of the | ||
| #' analytical uncertainty (2SD) for the 206Pb-normalised ratios. Must be | ||
| #' identical in both `df` and `ref`. Only used when `triplet = "206"` or | ||
| #' `triplet = "both"`. Default to `c("206Pb/204Pb_err2SD", | ||
| #' "207Pb/206Pb_err2SD", "208Pb/206Pb_err2SD")`. |
There was a problem hiding this comment.
Make sure that the function requires only the input of the triplet that should be used. I think this applies only to the checks. The input for the other triplet should just be ignored.
It is probably best to make this check based on the value of triplet. I did not saw a check that the column names and the value of triplet are matching.
| if (triplet == "204Pb" || triplet == "both") { | ||
| matches_204 <- run_amalia(ratios_204, error_204) | ||
| } | ||
|
|
||
| if (triplet == "206Pb" || triplet == "both") { | ||
| matches_206 <- run_amalia(ratios_206, error_206) | ||
| } | ||
|
|
||
| # Combine results | ||
| if (triplet == "204Pb") { | ||
| matches <- matches_204 | ||
| } | ||
|
|
||
| if (triplet == "206Pb") { | ||
| matches <- matches_206 | ||
| } | ||
|
|
||
| if (triplet == "both") { | ||
| if (nrow(matches_204) == 0 || nrow(matches_206) == 0) { | ||
| matches <- data.frame() | ||
| } else { | ||
| pair_204 <- paste(matches_204$sample_id, matches_204$ref_id) | ||
| pair_206 <- paste(matches_206$sample_id, matches_206$ref_id) | ||
| # Strict intersection — must pass in both triplets | ||
| keep_204 <- pair_204 %in% pair_206 | ||
| keep_206 <- pair_206 %in% pair_204 | ||
| # Combine diff columns from both triplets for matched pairs | ||
| matches <- cbind( | ||
| matches_204[keep_204, ], | ||
| matches_206[keep_206, 3:5] | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
I think this all can be simplified by having the matching running over all isotope ratios at once if triplet = "both". The algorithm returns FALSE when one value in a triplet doesn't match. Consequently, running the matching of all six ratios at the same time will give the same result as running the matching for each triplet separately and figuring out which matches both triples have in common.
Implementing this would require to use triplet only for determining which isotope ratios should be supplied to run_amalia() and all output of this function can then be handled the same.
| stop("Reference ID column '", id_ref, "' not found in reference data.") | ||
| } | ||
|
|
||
| run_amalia <- function(iso, err) { |
There was a problem hiding this comment.
Please avoid undocumented function nesting. See e.g. the ASTR parsing functions how they should be arranged and documented.
| if (nrow(matches) > 0) { | ||
| n_matches <- as.data.frame(table(matches$sample_id)) | ||
| names(n_matches) <- c("sample_id", "n_matches") | ||
| missing <- setdiff(all_ids, n_matches$sample_id) | ||
| if (length(missing) > 0) { | ||
| n_matches <- rbind( | ||
| n_matches, | ||
| data.frame(sample_id = missing, n_matches = 0L) | ||
| ) | ||
| } | ||
| n_matches <- n_matches[match(all_ids, n_matches$sample_id), ] | ||
| } else { | ||
| n_matches <- data.frame( | ||
| sample_id = all_ids, | ||
| n_matches = 0L | ||
| ) | ||
| } |
There was a problem hiding this comment.
It should be possible to simplify this with ifelse(), probably something similar to:
all_ids$n_matches <- ifelse(all_ids `%in%` matches$sample_id, <get summarised values from table() call>, 0L)
|
|
||
| if (length(unmatched) > 0) { | ||
| message( | ||
| length(unmatched), " sample(s) had no matches: ", |
There was a problem hiding this comment.
| length(unmatched), " sample(s) had no matches: ", | |
| length(unmatched), " sample(s) without matches: ", |
| #' df <- data.frame( | ||
| #' ID = c("Art1", "Art2", "Art3"), | ||
| #' `206Pb/204Pb` = c(18.244, 18.419, 18.050), | ||
| #' `207Pb/204Pb` = c(15.634, 15.658, 15.620), | ||
| #' `208Pb/204Pb` = c(38.407, 38.638, 38.157), | ||
| #' `206Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001), | ||
| #' `207Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001), | ||
| #' `208Pb/204Pb_err2SD` = c(0.002, 0.002, 0.002), | ||
| #' `207Pb/206Pb` = c(0.857, 0.850, 0.865), | ||
| #' `208Pb/206Pb` = c(2.105, 2.098, 2.114), | ||
| #' `207Pb/206Pb_err2SD` = c(0.00001, 0.00001, 0.00001), | ||
| #' `208Pb/206Pb_err2SD` = c(0.00004, 0.00004, 0.00004), | ||
| #' check.names = FALSE | ||
| #' ) | ||
| #' | ||
| #' ref <- data.frame( | ||
| #' ID = c("Ore_A", "Ore_B", "Ore_C"), | ||
| #' `206Pb/204Pb` = c(18.242, 18.500, 18.048), | ||
| #' `207Pb/204Pb` = c(15.633, 15.700, 15.619), | ||
| #' `208Pb/204Pb` = c(38.405, 38.800, 38.155), | ||
| #' `206Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001), | ||
| #' `207Pb/204Pb_err2SD` = c(0.001, 0.001, 0.001), | ||
| #' `208Pb/204Pb_err2SD` = c(0.002, 0.002, 0.002), | ||
| #' `207Pb/206Pb` = c(0.857, 0.850, 0.865), | ||
| #' `208Pb/206Pb` = c(2.105, 2.098, 2.114), | ||
| #' `207Pb/206Pb_err2SD` = c(0.00001, 0.00001, 0.00001), | ||
| #' `208Pb/206Pb_err2SD` = c(0.00004, 0.00004, 0.00004), | ||
| #' check.names = FALSE | ||
| #' ) |
There was a problem hiding this comment.
Columns for 204Pb/206Pb and respective analytical uncertainties are missing.
Hi @archaeothommy, I replaced the nested loops with expand.grid and mapply. The function is working and all tests pass (I haven't pushed the test function yet). Since the Argentina database has no measurement errors, I derived absolute 2SD uncertainties for the reference data using the SRM-981 interlaboratory ranges from the paper (0.10%, 0.14%, 0.20%, 0.13%, 0.20%) multiplied by each ratio value.