| name | IOBRskill |
|---|---|
| description | End-to-end bulk RNA-seq / microarray tumor microenvironment (TME) analysis using the IOBR R package. Covers data preprocessing, gene annotation, TME deconvolution (CIBERSORT/MCPcounter/EPIC/xCell/ESTIMATE/TIMER/quanTIseq/IPS), signature scoring (PCA/ssGSEA/Z-score), ligand-receptor analysis, pathway enrichment, survival modeling, and Nature-style visualization. Use this skill when the user mentions: tumor microenvironment analysis, TME deconvolution, immune infiltration analysis, 肿瘤微环境分析, 肿瘤微环境解析, 免疫浸润分析, ligand-receptor analysis, 受体配体分析, IOBR, immune cell deconvolution, CIBERSORT, MCPcounter, ESTIMATE, or any bulk RNA-seq immuno-oncology workflow. Also trigger on /IOBRskill. |
Every R script generated by this skill MUST begin with this citation header:
###############################################################################
# IOBR Citation:
# Zeng DQ, Fang YR, … , Yu GC, Liao WJ.
# Enhancing Immuno-Oncology Investigations Through Multidimensional Decoding of
# Tumour Microenvironment with IOBR 2.0, Cell Reports Methods, 2024
# https://doi.org/10.1016/j.crmeth.2024.100910
###############################################################################The same citation header MUST appear at the top of every log file in 06-log/.
Verify IOBR is installed in the base conda environment:
source /home/bio/miniconda3/etc/profile.d/conda.sh
conda activate base
Rscript -e 'packageVersion("IOBR")'If IOBR is not installed:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("IOBR/IOBR")IOBR relies on several packages that are NOT auto-installed. Install these before starting:
# For CIBERSORT
BiocManager::install("preprocessCore")
# For quanTIseq deconvolution
install.packages("limSolve", repos = "https://cloud.r-project.org")
# For tme_cluster()
install.packages("NbClust", repos = "https://cloud.r-project.org")
# For iobr_pca()
install.packages("FactoMineR", repos = "https://cloud.r-project.org")
# For sig_heatmap() — only if using tidyHeatmap approach (pheatmap preferred)
# install.packages("tidyHeatmap", repos = "https://cloud.r-project.org")IOBR ships with rich built-in data — you rarely need external annotation files. For full details, read references/iobr_built_in_data.md. Key resources:
| Resource | Access | What It Contains |
|---|---|---|
| Gene length / ID mapping | count2tpm() auto-loads |
111 genes with HUGO, ENTREZID, ENSEMBL |
| Microarray probe sets | extdata/probesets.txt |
176 probes mapped to cell types |
| TME signatures (186) | IOBR:::signature_tme |
Immune, stromal, EMT, checkpoint signatures |
| Full collection (323) | data("signature_collection") |
All signatures + Hallmark + Metabolism |
| Signature groupings | data("sig_group") |
35+ categories for filtering signatures |
| Pathway data (on-demand) | IOBR::load_data() |
hallmark (50), go_bp (7658), kegg (186), reactome (1615) |
| Example datasets | data("pdata_stad") etc. |
TCGA-STAD, IMvigor210 toy data |
IOBR provides IOBR::load_data() to download and cache pathway gene sets from GitHub on first use. These are essential for comprehensive pathway scoring:
hallmark <- IOBR::load_data("hallmark") # 50 pathways
go_bp <- IOBR::load_data("go_bp") # 7,658 gene sets
go_cc <- IOBR::load_data("go_cc")
go_mf <- IOBR::load_data("go_mf")
kegg <- IOBR::load_data("kegg") # 186 pathways
reactome <- IOBR::load_data("reactome") # 1,615 pathwaysCreate this structure before starting analysis:
project/
├── 01-script/ # R scripts (numbered, each with IOBR citation header)
├── 02-input/ # Normalized and annotated expression matrices
│ └── pdata_summary.csv # Phenotype variable summary
├── 03-tme/ # TME deconvolution and signature scoring results
├── 04-figs/
│ ├── *.png/pdf # All figures (PNG 300dpi + PDF, numbered naming)
│ ├── kmplot/ # KM survival plots (auto-saved by sig_surv_plot)
│ └── data/ # Statistical result tables matching figures
├── 05-note/
│ ├── IOBR-pipeline.md # Analysis plan (generated in Phase 0)
│ └── IOBR-analysis-README.md # Final analysis summary
└── 06-log/ # R execution logs (one per script, with citation header)
Naming conventions:
- Scripts:
01-data_preprocessing.R,02-tme_deconvolution.R, ... - Figures:
Fig01-barplot_cibersort.png+Fig01-barplot_cibersort.pdf(always dual format) - Statistical tables:
04-figs/data/01-tme_pdata_merged.csv,04-figs/data/04-wilcoxon_results.csv, ... - Logs:
01-data_preprocessing.log,02-tme_deconvolution.log, ... - Note:
IOBR-analysis-README.md(tree diagram + brief interpretation)
When running a script, capture the log:
Rscript 01-script/01-data_preprocessing.R > 06-log/01-data_preprocessing.log 2>&16 sequential phases (Phase 0–5). Pause at decision points to let the user choose.
Before writing any analysis code, generate an analysis plan document.
After confirming data type / species / deconvolution methods with the user (Phase 1 questions), create 05-note/IOBR-pipeline.md with:
- Project metadata: project name, data source, data type, species
- ASCII tree diagram of all scripts, their inputs/outputs, and expected results:
01-data_preprocessing.R
├─ Input: raw expression matrix + phenotype data
├─ Output: 02-input/annotated_eset.csv, 02-input/pdata.csv
└─ Notes: log2 transformation, gene ID conversion
02-tme_deconvolution.R
├─ Input: 02-input/annotated_eset.csv
├─ Output: 03-tme/tme_cibersort.csv, 03-tme/tme_mcpcounter.csv, ...
└─ Notes: CIBERSORT + MCPcounter + ESTIMATE (linear-scale data)
03-signature_analysis.R
├─ Input: 02-input/annotated_eset.csv
├─ Output: 03-tme/sig_score_tme.csv, 03-tme/sig_score_pathway.RData
└─ Notes: ssGSEA for TME signatures, Hallmark + KEGG pathways
04-statistical_analysis.R
├─ Input: 03-tme/*.csv + 02-input/pdata.csv
├─ Output: 04-figs/data/01-tme_pdata_merged.csv
│ 04-figs/data/02-tme_scaled.csv
│ 04-figs/data/03-tme_subtype.csv
│ 04-figs/data/04-wilcoxon_results.csv
│ 04-figs/data/05-surv_results.csv
│ 04-figs/data/06-cor_matrix.csv
└─ Notes: merge → scale → cluster → wilcoxon → survival → correlation
05-visualization.R
├─ Input: 04-figs/data/*.csv + 03-tme/*.csv
├─ Output: 04-figs/Fig01-09 (PNG + PDF)
└─ Notes: barplots → heatmaps → correlation → boxplots → forest → KM
- Present to user: "开始分析" or "调整计划"
- Only after approval, begin Phase 1
Script: 01-data_preprocessing.R
- Data source: User matrix, GEO (GSE#####), or TCGA
- Data type:
"Count"(raw RNA-seq),"Tpm"(already normalized), or"Array"(microarray) - Species:
"hsa"(human) or"mmu"(mouse)
Before any analysis, verify sample overlap between expression matrix and phenotype data:
# Report sample counts
eset_samples <- colnames(eset)
pdata_samples <- pdata$ID # or rownames(pdata)
intersection <- intersect(eset_samples, pdata_samples)
cat("Expression matrix samples:", length(eset_samples), "\n")
cat("Phenotype data samples:", length(pdata_samples), "\n")
cat("Matched samples:", length(intersection), "\n")
# Ask user: use all samples, intersection only, or check data?
# Then filter both to agreed sample set
eset <- eset[, intersection]
pdata <- pdata[pdata$ID %in% intersection, ]After loading pdata, generate a summary report:
# Generate pdata_summary.csv
pdata_summary <- data.frame(
var = character(),
type = character(),
n_levels = integer(),
levels = character(),
stringsAsFactors = FALSE
)
for (col in colnames(pdata)) {
vals <- pdata[[col]]
if (is.numeric(vals)) {
tp <- ifelse(length(unique(vals)) <= 2, "binary", "numeric")
lvls <- paste(range(vals, na.rm = TRUE), collapse = " - ")
n_lv <- length(unique(vals))
} else {
tp <- "factor"
lvls <- paste(unique(vals)[1:min(10, length(unique(vals)))], collapse = ", ")
n_lv <- length(unique(vals))
}
pdata_summary <- rbind(pdata_summary, data.frame(
var = col, type = tp, n_levels = n_lv, levels = lvls
))
}
write.csv(pdata_summary, file = "02-input/pdata_summary.csv", row.names = FALSE)
cat(">>>>> pdata summary saved to 02-input/pdata_summary.csv\n")This summary helps decide which statistical tests to apply in Phase 4.
| Function | Purpose | When to Use |
|---|---|---|
count2tpm() |
Count → TPM + gene ID mapping | Raw RNA-seq counts |
log2eset() |
Auto-detect and apply log2 transform | Ensures log2 scale for downstream analysis |
anno_eset() |
Probe → gene symbol annotation | Microarray data or unannotated matrices |
remove_batcheffect() |
ComBat / ComBat-seq batch correction | Multi-batch datasets |
find_outlier_samples() |
WGCNA-based outlier detection | QC step |
iobr_pca() |
PCA visualization for batch assessment | QC step |
mouse2human_eset() |
Mouse → human gene ortholog conversion | Mouse data |
# RNA-seq: counts → TPM
eset <- count2tpm(countMat = expr_matrix, idType = "Ensembl", org = "hsa")
# Microarray: probe annotation
eset <- anno_eset(eset = expr_matrix, annotation = anno_df,
symbol = "gene_symbol", probe = "probe_id", method = "mean")
# Ensure log2 scale (auto-detects — only transforms if not already log2)
eset <- log2eset(eset)Save output:
write.csv(eset, file = "02-input/annotated_eset.csv")
write.csv(pdata, file = "02-input/pdata.csv")Script: 02-tme_deconvolution.R
Ask the user which method(s) to use.
| Method | Cell Types | Best For |
|---|---|---|
"cibersort" |
22 (LM22) | Gold standard, comprehensive |
"cibersort_abs" |
22 (LM22) | CIBERSORT absolute mode (fraction-independent) |
"mcpcounter" |
8 populations | Quick, stromal + immune |
"epic" |
8 incl. cancer cells | Tumor purity |
"xcell" |
64 types | Broadest coverage (slow) |
"estimate" |
4 scores | Stromal/Immune/Tumor purity |
"timer" |
6 types | TCGA cancer-specific |
"quantiseq" |
10 incl. M1/M2 | Macrophage polarization |
"ips" |
4 axes | Immunotherapy response |
Recommended: CIBERSORT + MCPcounter + ESTIMATE (most cited). For comprehensive analysis, run all 9 methods and merge.
tme_result <- deconvo_tme(eset = eset, method = "cibersort", arrays = FALSE, perm = 1000)
write.csv(tme_result, file = "03-tme/tme_cibersort.csv")# Each method may take 1-5 minutes depending on sample size
# Use tryCatch for each method — some may fail on certain datasets
cibersort <- tryCatch(deconvo_tme(eset = eset, method = "cibersort",
arrays = FALSE, perm = 1000), error = function(e) NULL)
cibersort_abs <- tryCatch(deconvo_tme(eset = eset, method = "cibersort_abs",
arrays = FALSE, perm = 1000), error = function(e) NULL)
epic <- tryCatch(deconvo_tme(eset = eset, method = "epic",
arrays = FALSE), error = function(e) NULL)
mcp <- tryCatch(deconvo_tme(eset = eset, method = "mcpcounter"), error = function(e) NULL)
xcell <- tryCatch(deconvo_tme(eset = eset, method = "xcell",
arrays = FALSE), error = function(e) NULL)
estimate <- tryCatch(deconvo_tme(eset = eset, method = "estimate"), error = function(e) NULL)
timer <- tryCatch(deconvo_tme(eset = eset, method = "timer",
group_list = rep(tumor_type, ncol(eset))), error = function(e) NULL)
quantiseq <- tryCatch(deconvo_tme(eset = eset, method = "quantiseq",
tumor = TRUE, arrays = FALSE, scale_mrna = TRUE), error = function(e) NULL)
ips <- tryCatch(deconvo_tme(eset = eset, method = "ips", plot = FALSE), error = function(e) NULL)
# Merge all results by sample ID using base R merge()
results_list <- list(cibersort, cibersort_abs, mcp, xcell, epic, estimate, quantiseq, timer, ips)
results_list <- results_list[!sapply(results_list, is.null)]
tme_combine <- results_list[[1]]
for (i in seq_along(results_list)[-1]) {
tme_combine <- merge(tme_combine, results_list[[i]], by = "ID")
}
save(tme_combine, file = "03-tme/tme_combine.RData")
write.csv(tme_combine, file = "03-tme/tme_combine.csv", row.names = FALSE)- All deconvolution methods (except IPS) need linear-scale data — Anti-log first:
eset_linear <- 2^esetif log2-transformed. CIBERSORT, MCPcounter, ESTIMATE, EPIC, quanTIseq, xCell all warn "Data values appear small (<50)" when fed log2 data. Only IPS works with log2 data directly. - Column names have method suffix —
T_cells_CD8_CIBERSORT,T_cells_MCPcounter, etc. - Output CSV has
IDcolumn — Set rownames:rownames(df) <- df$ID; df$ID <- NULL. arrays = TRUEfor microarray;perm = 1000recommended for CIBERSORT.- Required packages —
preprocessCore(CIBERSORT),limSolve(quanTIseq). Install before running. - TIMER requires
group_list— Provide cancer type per sample:rep(tumor_type, ncol(eset)). - quanTIseq parameters — Use
tumor = TRUE, arrays = FALSE, scale_mrna = TRUEfor tumor samples. - Use
tryCatchfor each method — some methods may fail on certain datasets. Wrap each call to avoid losing partial results.
Script: 03-signature_analysis.R
| Method | Best For |
|---|---|
"ssgsea" |
Default, robust |
"pca" |
Continuous scores, large gene sets |
"zscore" |
Simple, fast |
"integration" |
Combined approach |
# Option A: TME signatures (186 sets, internal — use :::)
signature_tme <- IOBR:::signature_tme
sig_tme <- calculate_sig_score(pdata = NULL, eset = eset,
signature = signature_tme,
method = "ssgsea",
mini_gene_count = 3,
adjust_eset = TRUE)
write.csv(sig_tme, file = "03-tme/sig_score_tme.csv")
# Option B: Full collection (323 sets) — for comprehensive analysis
data("signature_collection")
sig_res <- calculate_sig_score(pdata = NULL, eset = eset,
signature = signature_collection,
method = "pca",
mini_gene_count = 2,
adjust_eset = TRUE)
write.csv(sig_res, file = "03-tme/sig_score_collection.csv")hallmark <- IOBR::load_data("hallmark") # 50 pathways
go_bp <- IOBR::load_data("go_bp") # 7,658 gene sets
go_cc <- IOBR::load_data("go_cc")
go_mf <- IOBR::load_data("go_mf")
kegg <- IOBR::load_data("kegg") # 186 pathways
reactome <- IOBR::load_data("reactome") # 1,615 pathways
sig_pathway <- calculate_sig_score(pdata = NULL, eset = eset,
signature = c(hallmark, go_bp, go_cc, go_mf, kegg, reactome),
method = "ssgsea",
mini_gene_count = 3)
write.csv(sig_pathway, file = "03-tme/sig_score_pathway.csv")tme_sig_combine <- merge(tme_combine, sig_res, by = "ID")
tme_sig_combine <- merge(tme_sig_combine, sig_pathway, by = "ID")
write.csv(tme_sig_combine, file = "03-tme/tme_sig_combine.csv", row.names = FALSE)pdata = NULL, adjust_eset = TRUE— This is the correct production pattern forcalculate_sig_score(). Do NOT setadjust_eset = FALSEunless data is already properly scaled.- Method names are lowercase — Use
"pca","ssgsea","zscore","integration"(NOT uppercase"PCA"). signature_tmeis internal — Access viaIOBR:::signature_tme(triple colon).signature_collectionneedsdata()— Calldata("signature_collection")before use.IOBR::load_data()downloads on first use — Caches locally for subsequent calls.- Pathway scoring is computationally heavy — Consider running Hallmark (50) and KEGG (186) first for quick results, then GO/Reactome for comprehensive analysis.
- Use base R
merge()for joining — Not all environments havetidyverse. Prefermerge(df1, df2, by = "ID")over%>% inner_join(). tme_cluster()requires NbClust — Install:install.packages("NbClust").
Use references/iobr_built_in_data.md for the full signature catalog and data("sig_group") for filtering by category.
tme_clusters <- tme_cluster(input = tme_result, features = cell_types,
id = "sample_id", method = "kmeans")Script: 04-statistical_analysis.R
This phase produces structured output: every statistical result is saved to 04-figs/data/ as a numbered CSV file for traceability and reuse in Phase 5 visualization.
Merge TME results with pdata, then scale all TME cell columns:
# Load data
pdata <- read.csv("02-input/pdata.csv")
tme <- read.csv("03-tme/tme_combine.csv")
# Merge pdata with TME results
tme_pdata <- merge(pdata, tme, by = "ID")
write.csv(tme_pdata, file = "04-figs/data/01-tme_pdata_merged.csv", row.names = FALSE)
# Collect TME cell columns — MUST match actual column names in merged data
# After merge, cibersort + cibersort_abs shared names get .x/.y suffixes
# So match by checking actual colname OR colname-without-suffix against original list
qc_patterns <- c("^P[.]", "^Correlation", "^RMSE")
all_tme_cell_cols <- colnames(tme_pdata)[
colnames(tme_pdata) %in% original_cell_cols |
gsub("\\.[xy]$", "", colnames(tme_pdata)) %in% original_cell_cols
]
all_tme_cell_cols <- setdiff(all_tme_cell_cols, "ID")
# Z-score scale all TME cell columns (per column = per variable)
tme_scaled <- tme_pdata
tme_scaled[, all_tme_cell_cols] <- scale(tme_scaled[, all_tme_cell_cols])
write.csv(tme_scaled, file = "04-figs/data/02-tme_scaled.csv", row.names = FALSE)Cluster samples by TME cell composition using CIBERSORT fractions:
# Extract CIBERSORT cell columns for clustering
cb_cols <- grep("_CIBERSORT$", colnames(tme_pdata), value = TRUE)
cb_cols <- setdiff(cb_cols, grep("P[.]value|Correlation|RMSE", cb_cols, value = TRUE))
# K-means clustering on scaled CIBERSORT data
set.seed(123)
tme_subtype <- tme_cluster(input = tme_scaled, features = cb_cols,
id = "ID", method = "kmeans", max.nc = 6)
# tme_cluster returns data.frame with columns: ID, cluster, <cell_cols>
# Extract cluster assignments and label as TME subtypes
subtype_df <- data.frame(ID = tme_subtype$ID, TME_subtype = paste0("TME", tme_subtype$cluster))
write.csv(subtype_df, file = "04-figs/data/03-tme_subtype.csv", row.names = FALSE)
# Find representative TME variables per subtype
markers <- find_markers_in_bulk(pdata = tme_scaled, eset = tme_scaled,
group = "TME_subtype", nfeatures = 50)
write.csv(markers, file = "04-figs/data/03-tme_subtype_markers.csv", row.names = FALSE)Run statistical tests based on what's available in pdata:
# --- Wilcoxon test (IF categorical variables exist in pdata) ---
# Check pdata_summary.csv for factor/binary variables
pdata_summary <- read.csv("02-input/pdata_summary.csv")
cat_vars <- pdata_summary$var[pdata_summary$type %in% c("factor", "binary")]
if (length(cat_vars) > 0) {
for (v in cat_vars) {
lvls <- unique(tme_pdata[[v]])
lvls <- lvls[!is.na(lvls)]
if (length(lvls) == 2) {
wilcox_res <- batch_wilcoxon(data = tme_pdata, target = v, feature = cell_cols)
write.csv(wilcox_res,
file = paste0("04-figs/data/04-wilcoxon_", v, ".csv"),
row.names = FALSE)
}
}
}
# --- Survival analysis (IF survival data exists) ---
# Check for time + status columns
time_cols <- grep("time|Time|OS_time|PFS_time", colnames(tme_pdata), value = TRUE)
status_cols <- grep("status|Status|OS_status|PFS_status", colnames(tme_pdata), value = TRUE)
if (length(time_cols) > 0 && length(status_cols) > 0) {
time_col <- time_cols[1]
status_col <- status_cols[1]
# Batch Cox regression for all TME cell columns
surv_res <- batch_surv(pdata = tme_pdata, variable = cell_cols,
time = time_col, status = status_col)
write.csv(surv_res, file = "04-figs/data/05-surv_results.csv", row.names = FALSE)
# Per-method survival (if multiple deconvolution methods were run)
methods_run <- unique(gsub(".*_", "", cell_cols))
for (m in methods_run) {
m_cols <- grep(paste0("_", m, "$"), cell_cols, value = TRUE)
if (length(m_cols) > 0) {
m_surv <- batch_surv(pdata = tme_pdata, variable = m_cols,
time = time_col, status = status_col)
write.csv(m_surv,
file = paste0("04-figs/data/05-surv_", m, ".csv"),
row.names = FALSE)
}
}
}Key notes for survival:
batch_surv()requires merged pdata — All variables must be columns inpdata. Usemerge()first.batch_surv()parameter isvariable— A vector of column names, NOT a separate data frame.batch_surv()output columns:P,HR,CI_low_0.95,CI_up_0.95,ID.
Correlation matrix of all TME variables:
# batch_cor(data, target, feature, method) — correlates one target vs multiple features
# NOTE: batch_cor() returns target-vs-each-feature, NOT a pairwise matrix.
# For a full pairwise correlation matrix, use base R cor() instead (see below).
# Key immune cells as target, all TME cells as features
key_cells <- c("T_cells_CD8_CIBERSORT", "Macrophages_M1_CIBERSORT")
cor_all <- data.frame()
for (tc in key_cells) {
cor_res <- batch_cor(data = tme_pdata, target = tc, feature = cell_cols, method = "spearman")
cor_res$target <- tc
cor_all <- rbind(cor_all, cor_res)
}
write.csv(cor_all, file = "04-figs/data/06-cor_matrix.csv", row.names = FALSE)
# For full pairwise correlation matrix (for heatmap visualization), use base R:
cor_mat <- cor(tme_pdata[, cell_cols], method = "spearman", use = "pairwise.complete.obs")
save(cor_mat, file = "04-figs/data/06-cor_matrix_full.RData")| Function | Purpose | Output |
|---|---|---|
batch_pcc() |
Batch partial correlation controlling for a third variable | Statistics table |
iobr_deg() |
DEG using DESeq2 or limma, with built-in volcano plot and heatmap | Statistics + figures |
sig_gsea() |
Gene Set Enrichment Analysis via fgsea, with built-in plots | Statistics + figures |
subgroup_survival() |
Subgroup Cox analysis by covariates | Statistics table |
LR_cal() |
Ligand-receptor interaction analysis | Statistics + figures |
find_mutations() |
Mutation-TME interaction (TCGA only) | Statistics table |
feature_manipulation() |
Clean features: remove NA, outliers, zero-variance | Cleaned data |
Script: 05-visualization.R
ALL TME data MUST be z-score scaled before visualization. Statistical result tables are in 04-figs/data/.
Every figure MUST:
- Use IOBR's built-in visualization functions whenever available
- Export as both PNG (300dpi) + PDF
- Follow the strict ordering below
IMPORTANT: Add pdf(NULL) at the top of the visualization script to suppress stray Rplots.pdf from IOBR internal plot calls. Also clean up at the end: if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf").
Every sig_box() call in this pipeline MUST use these fixed parameters:
sig_box(data = data, signature = feature_col, variable = group_col,
palette = "paired1", show_pairwise_p = TRUE,
angle_x_text = 60, hjust = 1,
size_of_pvalue = 3, size_of_font = 5)For multi-panel layouts, combine individual sig_box() plots with patchwork::wrap_plots().
IMPORTANT: Only methods that produce cell FRACTION data (CIBERSORT, EPIC) should use cell_bar_plot(). ESTIMATE scores are NOT fractions.
tme_cb <- read.csv("03-tme/tme_cibersort.csv")
# Filter QC columns
qc_cols <- grep("^[Pp][.]", colnames(tme_cb), value = TRUE)
qc_cols <- c(qc_cols, grep("^Correlation", colnames(tme_cb), value = TRUE),
grep("^RMSE", colnames(tme_cb), value = TRUE))
cell_cols_cb <- setdiff(grep("_CIBERSORT$", colnames(tme_cb), value = TRUE), qc_cols)
p1 <- cell_bar_plot(input = tme_cb, id = "ID", features = cell_cols_cb,
title = "CIBERSORT TME Composition",
coord_flip = TRUE, palette = 4, legend.position = "right")
p1 <- p1 + guides(fill = guide_legend(ncol = 1)) +
theme(legend.text = element_text(size = 5),
legend.key.size = unit(2.5, "mm"),
legend.title = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
if (nrow(tme_cb) > 50) {
p1 <- p1 + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
}
ggsave("04-figs/Fig01-barplot_cibersort.png", p1, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig01-barplot_cibersort.pdf", p1, width = 180, height = 250, units = "mm")tme_ep <- read.csv("03-tme/tme_epic.csv")
epic_cols <- grep("_EPIC$", colnames(tme_ep), value = TRUE)
p2 <- cell_bar_plot(input = tme_ep, id = "ID", features = epic_cols,
title = "EPIC TME Composition",
coord_flip = TRUE, palette = 4, legend.position = "right")
p2 <- p2 + guides(fill = guide_legend(ncol = 1)) +
theme(legend.text = element_text(size = 5),
legend.key.size = unit(2.5, "mm"),
legend.title = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
ggsave("04-figs/Fig02-barplot_epic.png", p2, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig02-barplot_epic.pdf", p2, width = 180, height = 250, units = "mm")Cluster both rows and columns, no subtype annotation. Use pheatmap.
plot_heatmap_no_subtype <- function(data, cols, title, prefix,
width = 8, height = 6) {
mat <- t(as.matrix(data[, cols])) # variables as rows, samples as columns
mat[mat > 2] <- 2; mat[mat < -2] <- -2
rownames(mat) <- sub("\\.[xy]$", "", rownames(mat))
colnames(mat) <- data$ID
for (fmt in c("pdf", "png")) {
pheatmap::pheatmap(mat,
color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
show_colnames = FALSE, cluster_rows = TRUE, cluster_cols = TRUE,
treeheight_row = 15, treeheight_col = 15, fontsize_row = 8,
main = title,
filename = paste0("04-figs/", prefix, ".", fmt),
width = width, height = height, dpi = 300)
}
}
# Fig03a: CIBERSORT
plot_heatmap_no_subtype(tme_scaled, cb_sc, "CIBERSORT TME Landscape",
"Fig03a-heatmap_cibersort")
# Fig03b: MCPcounter
mcp_sc <- grep("MCPcounter$", colnames(tme_scaled), value = TRUE)
plot_heatmap_no_subtype(tme_scaled, mcp_sc, "MCPcounter TME Landscape",
"Fig03b-heatmap_mcpcounter", width = 6, height = 5)
# Fig03c: xCell
xc_sc <- grep("xCell$", colnames(tme_scaled), value = TRUE)
plot_heatmap_no_subtype(tme_scaled, xc_sc, "xCell TME Landscape",
"Fig03c-heatmap_xcell", width = 10, height = 8)USE sig_forest() — IOBR's built-in forest plot function. MUST filter extreme HR before plotting: remove variables with abs(HR) > 10 to prevent sig_forest() from producing blank plots (its coord_trans(x = "log2") only has 3 fixed breaks that break with extreme HR ranges).
if (length(time_cols) > 0 && length(status_cols) > 0) {
time_col <- time_cols[1]
status_col <- status_cols[1]
# Fig04a: CIBERSORT survival forest
cb_surv <- batch_surv(pdata = tme_pdata, variable = cb_sc,
time = time_col, status = status_col)
write.csv(cb_surv, file = "04-figs/data/05-surv_cibersort.csv", row.names = FALSE)
cb_surv_filt <- cb_surv[abs(as.numeric(cb_surv$HR)) <= 10, ]
if (nrow(cb_surv_filt) > 0) {
p04a <- sig_forest(cb_surv_filt, signature = "ID", n = 20)
ggsave("04-figs/Fig04a-forest_cibersort.png", p04a, width = 10, height = 6, units = "in", dpi = 300)
ggsave("04-figs/Fig04a-forest_cibersort.pdf", p04a, width = 10, height = 6)
}
# Fig04b: MCPcounter + EPIC + xCell survival forest
other_cols <- c(mcp_sc, epic_sc, xc_sc)
other_surv <- batch_surv(pdata = tme_pdata, variable = other_cols,
time = time_col, status = status_col)
write.csv(other_surv, file = "04-figs/data/05-surv_other_tme.csv", row.names = FALSE)
other_surv_filt <- other_surv[abs(as.numeric(other_surv$HR)) <= 10, ]
if (nrow(other_surv_filt) > 0) {
p04b <- sig_forest(other_surv_filt, signature = "ID", n = 20)
ggsave("04-figs/Fig04b-forest_other_tme.png", p04b, width = 10, height = 6, units = "in", dpi = 300)
ggsave("04-figs/Fig04b-forest_other_tme.pdf", p04b, width = 10, height = 6)
}
# Fig04c: TME signatures survival forest
tme_sig_surv <- batch_surv(pdata = tme_pdata, variable = tme_sig_cols,
time = time_col, status = status_col)
write.csv(tme_sig_surv, file = "04-figs/data/05-surv_tme_signatures.csv", row.names = FALSE)
tme_sig_surv_filt <- tme_sig_surv[abs(as.numeric(tme_sig_surv$HR)) <= 10, ]
if (nrow(tme_sig_surv_filt) > 0) {
p04c <- sig_forest(tme_sig_surv_filt, signature = "ID", n = 20)
ggsave("04-figs/Fig04c-forest_tme_signature.png", p04c, width = 10, height = 6, units = "in", dpi = 300)
ggsave("04-figs/Fig04c-forest_tme_signature.pdf", p04c, width = 10, height = 6)
}
# Fig04d: GO/KEGG/Hallmark survival forest
pathway_surv <- batch_surv(pdata = tme_pdata, variable = pathway_cols,
time = time_col, status = status_col)
write.csv(pathway_surv, file = "04-figs/data/05-surv_pathways.csv", row.names = FALSE)
pathway_surv_filt <- pathway_surv[abs(as.numeric(pathway_surv$HR)) <= 10, ]
if (nrow(pathway_surv_filt) > 0) {
p04d <- sig_forest(pathway_surv_filt, signature = "ID", n = 20)
ggsave("04-figs/Fig04d-forest_go_kegg.png", p04d, width = 10, height = 6, units = "in", dpi = 300)
ggsave("04-figs/Fig04d-forest_go_kegg.pdf", p04d, width = 10, height = 6)
}
}Key notes:
- MUST filter
abs(HR) > 10beforesig_forest()— extreme HR values (e.g., 0.001–190) causesig_forest()to produce blank plots because its internalcoord_trans(x = "log2")only has 3 fixed axis breaks. - Save full (unfiltered) survival results to
04-figs/data/— filtering is only for visualization. - Split into 4 groups: CIBERSORT, MCPcounter+EPIC+xCell, TME Signatures, GO/KEGG/Hallmark.
Input: CIBERSORT + MCPcounter + ESTIMATE variables. Use pheatmap on Spearman correlation matrix.
load("04-figs/data/06-cor_matrix_full.RData")
# Deduplicate: keep .x (cibersort), drop .y (cibersort_abs)
keep <- !grepl("\\.[xy]$", rownames(cor_mat)) | grepl("\\.x$", rownames(cor_mat))
cor_mat <- cor_mat[keep, keep]
rownames(cor_mat) <- sub("\\.x$", "", rownames(cor_mat))
colnames(cor_mat) <- sub("\\.x$", "", colnames(cor_mat))
for (fmt in c("pdf", "png")) {
pheatmap::pheatmap(cor_mat, fontsize = 6, fontsize_row = 5, fontsize_col = 5,
color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
border_color = NA, main = "TME Cell Correlation Matrix",
filename = paste0("04-figs/Fig05-cor_matrix.", fmt),
width = 8, height = 7, dpi = 300)
}Heatmap dimension rules (MUST follow):
| Element | Position | Parameter |
|---|---|---|
| Variables (cell types) | Rows (y-axis) | rownames(mat) — shows readable names |
| Samples | Columns (x-axis) | show_colnames=FALSE |
| Subtype color bar | Top (column annotation) | annotation_col |
| Subtype gaps | Column gaps | gaps_col |
| Clustering | Rows only | cluster_rows=TRUE, cluster_cols=FALSE |
plot_subtype_heatmap <- function(data, cols, subtype_col, title, prefix,
width = 8, height = 6) {
data <- data[order(data[[subtype_col]]), ]
mat <- t(as.matrix(data[, cols])) # variables -> rows, samples -> columns
mat[mat > 2] <- 2; mat[mat < -2] <- -2
rownames(mat) <- sub("\\.[xy]$", "", rownames(mat))
colnames(mat) <- data$ID
ann_col <- data.frame(Subtype = data[[subtype_col]], row.names = data$ID)
subtypes <- sort(unique(data[[subtype_col]]))
sub_colors <- c("#E64B35", "#4DBBD5", "#00A087", "#3C5488", "#F39B7F", "#8491B6")
ann_colors <- list(Subtype = setNames(sub_colors[seq_along(subtypes)], subtypes))
gaps <- cumsum(table(data[[subtype_col]]))[-length(subtypes)]
for (fmt in c("pdf", "png")) {
pheatmap::pheatmap(mat, annotation_col = ann_col, annotation_colors = ann_colors,
color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
show_colnames = FALSE, cluster_rows = TRUE, cluster_cols = FALSE,
treeheight_row = 15, gaps_col = gaps, fontsize_row = 8,
main = title,
filename = paste0("04-figs/", prefix, ".", fmt), width = width, height = height, dpi = 300)
}
}
# Fig06a: CIBERSORT by subtype
hm_data_cb <- tme_scaled[, c("ID", cb_sc, "TME_subtype")]
plot_subtype_heatmap(hm_data_cb, cb_sc, "TME_subtype",
"CIBERSORT TME by Subtype", "Fig06a-heatmap_cibersort_subtype")
# Fig06b: MCPcounter by subtype
hm_data_mcp <- tme_scaled[, c("ID", mcp_sc, "TME_subtype")]
plot_subtype_heatmap(hm_data_mcp, mcp_sc, "TME_subtype",
"MCPcounter TME by Subtype", "Fig06b-heatmap_mcpcounter_subtype", width = 6, height = 5)
# Fig06c: xCell by subtype
xc_sc_sub <- intersect(xc_sc, colnames(tme_scaled))
hm_data_xc <- tme_scaled[, c("ID", xc_sc_sub, "TME_subtype")]
plot_subtype_heatmap(hm_data_xc, xc_sc_sub, "TME_subtype",
"xCell TME by Subtype", "Fig06c-heatmap_xcell_subtype", width = 10, height = 8)
# Fig06d: KM plot by subtype (IF survival data exists)
# sig_surv_plot does NOT work with arbitrary categorical variables.
# Use survminer directly for KM by TME subtype.
if (length(time_cols) > 0 && length(status_cols) > 0) {
library(survival)
library(survminer)
surv_data <- read.csv("04-figs/data/01-tme_pdata_merged.csv")
dir.create("04-figs/kmplot", showWarnings = FALSE, recursive = TRUE)
surv_obj <- Surv(time = surv_data[[time_col]], event = surv_data[[status_col]])
km_fit <- survfit(surv_obj ~ TME_subtype, data = surv_data)
p_km <- ggsurvplot(km_fit, data = surv_data,
pval = TRUE, conf.int = FALSE,
palette = c("#E64B35", "#4DBBD5", "#00A087", "#3C5488", "#F39B7F", "#8491B6"),
legend.title = "TME Subtype", legend.labs = sort(unique(surv_data$TME_subtype)),
xlab = "Time (months)", ylab = "Overall Survival",
risk.table = TRUE, fontsize = 3, surv.median.line = "hv")
pdf("04-figs/Fig06d-km_subtype.pdf", width = 7, height = 6)
print(p_km)
dev.off()
png("04-figs/Fig06d-km_subtype.png", width = 7, height = 6, units = "in", res = 300)
print(p_km)
dev.off()
}pheatmap notes:
- Do NOT re-scale — data is already z-scored in Phase 4. Just cap at ±2.
treeheight_row = 15— shrink clustering tree.- Must transpose — pheatmap needs variables as rows for readable y-axis labels.
cluster_cols = FALSE— columns ordered by subtype, clustering would destroy ordering.- CRITICAL: Do NOT use
grep("CIBERSORT")in merged data — cibersort + cibersort_abs share cell names. Read from original cibersort CSV instead.
For each method group, find the most significant and positive variable for each TME subtype, create one sig_box per subtype, and combine with patchwork.
library(patchwork)
find_top_positive_per_subtype <- function(data, cell_cols, subtype_col = "TME_subtype") {
subtypes <- sort(unique(data[[subtype_col]]))
result <- character(0)
for (st in subtypes) {
st_data <- data[data[[subtype_col]] == st, cell_cols, drop = FALSE]
cell_means <- colMeans(st_data, na.rm = TRUE)
result[st] <- names(sort(cell_means, decreasing = TRUE))[1]
}
return(result)
}
make_subtype_boxplot_panel <- function(data, method_cols, subtype_col, title, filename) {
top_cells <- find_top_positive_per_subtype(data, method_cols, subtype_col)
subtypes <- names(top_cells)
box_list <- lapply(subtypes, function(st) {
col <- top_cells[st]
clean_name <- sub("\\.[xy]$", "", col)
sig_box(data = data, signature = col, variable = subtype_col,
palette = "paired1", show_pairwise_p = TRUE,
angle_x_text = 60, hjust = 1,
size_of_pvalue = 3, size_of_font = 5) +
ggtitle(paste0(st, ": ", clean_name)) +
theme(axis.title = element_text(size = 6),
axis.text.y = element_text(size = 5),
plot.title = element_text(size = 7),
plot.margin = margin(12, 5, 5, 5))
})
p <- wrap_plots(box_list, ncol = length(subtypes))
ggsave(paste0("04-figs/", filename, ".png"), p,
width = 50 * length(subtypes), height = 100, units = "mm", dpi = 300)
ggsave(paste0("04-figs/", filename, ".pdf"), p,
width = 50 * length(subtypes), height = 100, units = "mm")
}
# Fig07a: CIBERSORT
make_subtype_boxplot_panel(tme_scaled, cb_sc, "TME_subtype",
"CIBERSORT Subtype Boxplots", "Fig07a-boxplot_cibersort_subtype")
# Fig07b: MCPcounter
make_subtype_boxplot_panel(tme_scaled, mcp_sc, "TME_subtype",
"MCPcounter Subtype Boxplots", "Fig07b-boxplot_mcpcounter_subtype")
# Fig07c: xCell
make_subtype_boxplot_panel(tme_scaled, xc_sc, "TME_subtype",
"xCell Subtype Boxplots", "Fig07c-boxplot_xcell_subtype")
# Fig07d: TME signatures
# Identify TME signature columns from Phase 3 output
tme_sig_sc <- colnames(tme_scaled)[colnames(tme_scaled) %in% sig_tme_names]
if (length(tme_sig_sc) > 0) {
make_subtype_boxplot_panel(tme_scaled, tme_sig_sc, "TME_subtype",
"TME Signatures Subtype Boxplots", "Fig07d-boxplot_tme_signature_subtype")
}
# Fig07e: GO/KEGG/Hallmark
pathway_sc <- grep("^HALLMARK_|^KEGG_|^GO_|^REACTOME_", colnames(tme_scaled), value = TRUE)
if (length(pathway_sc) > 0) {
make_subtype_boxplot_panel(tme_scaled, pathway_sc, "TME_subtype",
"Pathway Subtype Boxplots", "Fig07e-boxplot_pathway_subtype")
}For each binary variable: batch_wilcoxon() → top 10 → sig_box + patchwork (2×5).
For each multi-level categorical variable: batch_kruskal() → top 10 → sig_box + patchwork (2×5).
pdata_summary <- read.csv("02-input/pdata_summary.csv")
cat_vars <- pdata_summary$var[pdata_summary$type %in% c("factor", "binary")]
fig_idx <- 0
if (length(cat_vars) > 0) {
# Collect ALL TME variables (deconvolution + signatures + pathways)
# MUST filter to NUMERIC columns only — batch_wilcoxon/batch_kruskal require numeric features
pdata_col_set <- colnames(pdata)
all_tme_vars <- setdiff(colnames(tme_pdata), pdata_col_set)
all_tme_vars <- setdiff(all_tme_vars, "ID")
# Keep only columns that are numeric
all_tme_vars <- all_tme_vars[sapply(tme_pdata[, all_tme_vars], is.numeric)]
for (v in cat_vars) {
lvls <- unique(tme_pdata[[v]])
lvls <- lvls[!is.na(lvls)]
if (length(lvls) >= 2) {
fig_idx <- fig_idx + 1
fig_letter <- letters[fig_idx] # a, b, c, ...
if (length(lvls) == 2) {
# Binary: Wilcoxon
diff_res <- batch_wilcoxon(data = tme_pdata, target = v, feature = all_tme_vars)
} else {
# Multi-level: Kruskal
diff_res <- batch_kruskal(data = tme_pdata, target = v, feature = all_tme_vars)
}
diff_res <- diff_res[order(diff_res$p.value), ]
top10 <- head(diff_res, 10)
write.csv(diff_res, file = paste0("04-figs/data/04-diff_", v, ".csv"), row.names = FALSE)
if (nrow(top10) > 0) {
box_list <- lapply(seq_len(nrow(top10)), function(i) {
sig_box(data = tme_pdata, signature = top10$sig_names[i],
variable = v, palette = "paired1", show_pairwise_p = TRUE,
angle_x_text = 60, hjust = 1,
size_of_pvalue = 3, size_of_font = 5) +
theme(axis.title = element_text(size = 5),
axis.text.y = element_text(size = 5),
plot.title = element_text(size = 5),
plot.margin = margin(12, 5, 5, 5))
})
p_combined <- wrap_plots(box_list, ncol = 5, nrow = 2)
ggsave(paste0("04-figs/Fig08", fig_letter, "-top10_", v, ".png"), p_combined,
width = 250, height = 220, units = "mm", dpi = 300)
ggsave(paste0("04-figs/Fig08", fig_letter, "-top10_", v, ".pdf"), p_combined,
width = 250, height = 220, units = "mm")
}
}
}
}Key notes:
- Binary variable (2 levels): use
batch_wilcoxon(). - Multi-level variable (≥3 levels): use
batch_kruskal(). - Both return tibble with columns:
sig_names,p.value,statistic,p.adj,log10pvalue,stars. - One figure per categorical variable: Fig08a for variable1, Fig08b for variable2, etc.
Select most positive (lowest HR, protective) and most negative (highest HR, risk) variable from each of 4 groups:
- CIBERSORT → a (most positive), b (most negative)
- Other TME methods → c, d
- TME signatures → e, f
- GO/KEGG/Hallmark → g, h
Total: up to 8 KM plots saved to 04-figs/kmplot/.
if (length(time_cols) > 0 && length(status_cols) > 0) {
dir.create("04-figs/kmplot", showWarnings = FALSE, recursive = TRUE)
find_extreme_hr <- function(surv_res, group_cols) {
if (is.null(surv_res) || nrow(surv_res) == 0) return(character(0))
group_res <- surv_res[surv_res$ID %in% group_cols, ]
if (nrow(group_res) == 0) return(character(0))
group_res <- group_res[complete.cases(group_res[, c("HR", "P")]), ]
if (nrow(group_res) == 0) return(character(0))
# Most positive (lowest HR = protective), most negative (highest HR = risk)
most_pos <- group_res$ID[which.min(group_res$HR)]
most_neg <- group_res$ID[which.max(group_res$HR)]
return(c(most_pos, most_neg))
}
# Collect all survival results
surv_all <- rbind(cb_surv, other_surv, tme_sig_surv, pathway_surv)
km_vars <- character(0)
# Group 1: CIBERSORT
km_vars <- c(km_vars, find_extreme_hr(surv_all, cb_sc))
# Group 2: Other TME methods
other_sc <- c(mcp_sc, epic_sc, xc_sc)
km_vars <- c(km_vars, find_extreme_hr(surv_all, other_sc))
# Group 3: TME signatures
km_vars <- c(km_vars, find_extreme_hr(surv_all, tme_sig_cols))
# Group 4: GO/KEGG/Hallmark
km_vars <- c(km_vars, find_extreme_hr(surv_all, pathway_cols))
km_vars <- unique(km_vars[nchar(km_vars) > 0])
surv_data <- read.csv("04-figs/data/01-tme_pdata_merged.csv")
for (i in seq_along(km_vars)) {
sig_surv_plot(input_pdata = surv_data, signature = km_vars[i],
time = time_col, status = status_col, time_type = "month",
palette = "jama", mini_sig = "score",
fig.type = "pdf", save_path = "04-figs/kmplot",
project = paste0("KM_", i), index = i)
}
}Key notes:
sig_surv_plot()saves its own output — Do NOT wrap withggsave().- KM plots saved to
04-figs/kmplot/— each variable generates PDF + RData. mini_sig = "score"— continuous variable, auto-split by median.time_type:"day","month", or"year".
if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf")
Use these as needed based on analysis requirements:
| Function | Purpose | Output |
|---|---|---|
roc_time() |
Time-dependent ROC with AUC at multiple time points | Figure (self-saving) |
sig_roc() |
Multiple ROC curves for binary response prediction | Figure |
batch_sig_surv_plot() |
Batch KM curves for multiple signatures across projects | Figures (self-saving) |
surv_group() |
KM curve with risk table (High vs Low) | Figure |
subgroup_survival() |
Subgroup Cox analysis by covariates | Statistics table |
get_cor() |
Single pairwise correlation with scatter plot + regression line | Figure |
iobr_cor_plot() |
Batch correlation visualization (signature vs signature/phenotype) | Figure |
iobr_pca() |
PCA dimensionality reduction with scatter plot | Figure |
iobr_deg() |
DEG with built-in volcano plot and heatmap output | Statistics + figures |
sig_gsea() |
Gene Set Enrichment Analysis via fgsea, with built-in plots | Statistics + figures |
LR_cal() |
Ligand-receptor interaction analysis | Statistics + figures |
# Time-dependent ROC — saves its own output
roc_time(input = tme_pdata, vars = c("T_cells_CD8_CIBERSORT", "TMEscoreA_CIR"),
time = "OS_time", status = "OS_status",
time_point = c(12, 36, 60), time_type = "month", palette = "jama",
path = "04-figs", fig.type = "png", index = 1)
# PCA visualization
iobr_pca(data = eset, pdata = pdata, group = "TME_subtype", scale = TRUE)
# Single correlation scatter plot
get_cor(data = tme_pdata, variable1 = "T_cells_CD8_CIBERSORT",
variable2 = "CD_8_T_effector", method = "spearman")Categorical (palette_group parameter):
| Name | Best For |
|---|---|
"jama" |
Default, 2–5 groups |
"jco" |
Clinical oncology |
"nrc" |
Nature publications |
"aaas" |
Science journal |
"accent" |
Colorblind-friendly |
"set2" |
Many groups |
Heatmap (palette parameter, numeric 1–6):
| # | Style |
|---|---|
| 2 | Blue-white-red diverging (default) |
| 1 | Classic pheatmap |
| 3 | Purple-orange |
| 4–6 | Sequential gradients |
Cell bar plot (cell_bar_plot() palette parameter, numeric 1–4):
| # | Style |
|---|---|
| 4 | Recommended — rich random palette with many distinct colors |
| 1–3 | Alternative random palettes |
If unsure which palette, present 3 options to the user and ask.
Pause and ask at:
- Phase 0: Review and approve analysis plan (
IOBR-pipeline.md) - Phase 1: Data type (Count/TPM/Array) and species (hsa/mmu)
- Phase 2: Which deconvolution method(s)
- Phase 3: Scoring method and which signature collection
- Phase 5: Color palette preference
Do NOT skip this phase. After Phase 5 visualization finishes and all log files are verified, you MUST:
- List actual files in
04-figs/,03-tme/,02-input/, and04-figs/data/to build the tree from reality - Write
05-note/IOBR-analysis-README.mdwith:- IOBR citation header at top
- ASCII tree diagram showing ACTUAL outputs (not template placeholders)
- Brief interpretation of key findings (deconvolution method count, subtype count, top survival hits, available figures)
- Count figures, tables, and log files — report these numbers in the summary
Before writing the note, scan the output:
# Count figures generated
echo "=== Figures ===" && ls 04-figs/Fig*.png | wc -l
# Deconvolution methods succeeded
echo "=== TME CSVs ===" && ls 03-tme/tme_*.csv
# Survival data availability
echo "=== Survival ===" && ls 04-figs/data/05-surv_*.csv 2>/dev/null || echo "No survival data"
# Subtype info
echo "=== Subtypes ===" && head -5 04-figs/data/03-tme_subtype.csv
# Log file check
echo "=== Logs ===" && wc -l 06-log/*.log# IOBR Citation:
# Zeng DQ, Fang YR, … , Yu GC, Liao WJ.
# Enhancing Immuno-Oncology Investigations Through Multidimensional Decoding of
# Tumour Microenvironment with IOBR 2.0, Cell Reports Methods, 2024
# https://doi.org/10.1016/j.crmeth.2024.100910
# <Project Name> — TME Analysis Summary
## Analysis Summary
- Dataset: <N> samples, <data_type>
- Deconvolution: <methods used, comma-separated>
- Signature scoring: <method>
- TME subtypes: <N> subtypes identified (K-means on CIBERSORT)
## Key Findings
<3-5 bullet points summarizing the most important results>
## Output Directory Tree
project/ ├── 01-script/ │ ├── 01-data_preprocessing.R │ ├── 02-tme_deconvolution.R │ ├── 03-signature_analysis.R │ ├── 04-statistical_analysis.R │ └── 05-visualization.R ├── 02-input/ │ ├── annotated_eset.csv │ ├── pdata.csv │ └── pdata_summary.csv ├── 03-tme/ │ ├── tme_cibersort.csv # CIBERSORT: 22 cell types │ ├── tme_mcpcounter.csv # MCPcounter: 8 cell populations │ ├── tme_estimate.csv # ESTIMATE: 4 scores │ ├── sig_score_tme.csv # TME signature scores (ssGSEA) │ └── tme_sig_combine.csv # Merged master table ├── 04-figs/ │ ├── Fig01-barplot_cibersort.png/pdf │ ├── Fig02-barplot_epic.png/pdf │ ├── Fig03a-heatmap_cibersort.png/pdf │ ├── Fig03b-heatmap_mcpcounter.png/pdf │ ├── Fig03c-heatmap_xcell.png/pdf │ ├── Fig04a-forest_cibersort.png/pdf # (IF survival) │ ├── Fig04b-forest_other_tme.png/pdf # (IF survival) │ ├── Fig04c-forest_tme_signature.png/pdf # (IF survival) │ ├── Fig04d-forest_go_kegg.png/pdf # (IF survival) │ ├── Fig05-cor_matrix.png/pdf │ ├── Fig06a-heatmap_cibersort_subtype.png/pdf │ ├── Fig06b-heatmap_mcpcounter_subtype.png/pdf │ ├── Fig06c-heatmap_xcell_subtype.png/pdf │ ├── Fig06d-km_subtype.png/pdf # (IF survival) │ ├── Fig07a-boxplot_cibersort_subtype.png/pdf │ ├── Fig07b-boxplot_mcpcounter_subtype.png/pdf │ ├── Fig07c-boxplot_xcell_subtype.png/pdf │ ├── Fig07d-boxplot_tme_signature_subtype.png/pdf │ ├── Fig07e-boxplot_pathway_subtype.png/pdf │ ├── Fig08a-top10_.png/pdf # (IF categorical) │ ├── Fig08b-top10_.png/pdf # (IF categorical) │ ├── kmplot/ # KM plots (IF survival) │ └── data/ # Statistical result tables ├── 05-note/ │ ├── IOBR-pipeline.md │ └── IOBR-analysis-README.md └── 06-log/ ├── 01-data_preprocessing.log ├── 02-tme_deconvolution.log ├── 03-signature_analysis.log ├── 04-statistical_analysis.log └── 05-visualization.log
After writing, present a one-paragraph summary to the user listing figure count, log status, and key findings.
Every R script MUST be executed with log capture. No exceptions.
- One log file per script, named to match:
01-data_preprocessing.log,02-tme_deconvolution.log, ... - Each log MUST begin with the IOBR citation header (same as scripts).
- Logs capture both stdout and stderr for debugging and reproducibility.
- After running all scripts, verify:
ls -la 06-log/should show 5 non-empty log files. - If a script fails, check the log file for the error message.
Run EVERY script with this pattern:
mkdir -p 06-log
# Phase 1
{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/01-data_preprocessing.R; } > 06-log/01-data_preprocessing.log 2>&1
# Phase 2
{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/02-tme_deconvolution.R; } > 06-log/02-tme_deconvolution.log 2>&1
# Phase 3
{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/03-signature_analysis.R; } > 06-log/03-signature_analysis.log 2>&1
# Phase 4
{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/04-statistical_analysis.R; } > 06-log/04-statistical_analysis.log 2>&1
# Phase 5
{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/05-visualization.R; } > 06-log/05-visualization.log 2>&1- Check IOBR installation in base environment
- Create project directory structure (01-script through 06-log + 04-figs/data + 04-figs/kmplot)
- Confirm data source, type, and species with user
- Generate
05-note/IOBR-pipeline.md— present to user for approval - Write
01-data_preprocessing.R(with citation header), run with log capture to06-log/ - Verify sample matching (eset vs pdata), generate
02-input/pdata_summary.csv - Ask user to choose deconvolution method(s)
- Write
02-tme_deconvolution.R, run with log capture to06-log/ - Ask user about scoring method and gene sets
- Write
03-signature_analysis.R, run with log capture to06-log/ - Write
04-statistical_analysis.R— merge, scale, cluster, test, correlate, save to 04-figs/data/ - Write
05-visualization.R— follow Fig01→09 ordering with conditional figures (survival, categorical) - Run
05-visualization.Rwith log capture to06-log/ - Phase 6 (MANDATORY): List actual outputs → count figures/tables/logs → extract key findings → write
05-note/IOBR-analysis-README.mdwith tree + summary - Verify all 5 log files in
06-log/are non-empty with citation headers
references/functions.md— IOBR function parameter referencereferences/palettes.md— Color palette selection guidereferences/iobr_built_in_data.md— Complete catalog of built-in signatures, annotations, and datasetsreferences/iobr_pipeline_template.R— Full production pipeline template (9-method deconvolution + signature + pathway scoring)
Read these when you need specific parameter details, palette options, want to know what signatures are available, or need a complete pipeline template to adapt.