|
| 1 | +# Integration Review Plan |
| 2 | +<!-- Hand-authored in the /review-integration plan schema. Edit freely, but preserve chunk IDs and status values. --> |
| 3 | + |
| 4 | +## Metadata |
| 5 | +- **Kind**: `integration` |
| 6 | +- **Package**: NMFMerge |
| 7 | +- **Source review date**: 2026-05-21 |
| 8 | +- **Current version**: 1.1.1 |
| 9 | +- **Issue**: n/a |
| 10 | + |
| 11 | +## Stated values |
| 12 | +The goal of this plan is to make NMFMerge compilable into a C-ABI shared |
| 13 | +library by `juliac --trim`, so it can be called from Python/Matlab/R (the |
| 14 | +`teh/wrappers` branch). Guiding preferences: |
| 15 | + |
| 16 | +- Fail-fast over silent continuation: surfacing errors matters more than |
| 17 | + papering over them. Trimming strips error-message machinery, which is in |
| 18 | + tension with this — flag, do not silently accept, any loss of diagnosability. |
| 19 | +- Keep the flexible interactive `nmfmerge` API intact for Julia users; the |
| 20 | + compiled library may use a separate, deliberately monomorphic entrypoint. |
| 21 | +- Upstream trim-cleanliness fixes (NMF, TSVD, NonNegLeastSquares) are |
| 22 | + preferred over local vendored patches where the maintainers will accept them. |
| 23 | +- Annotate arguments only as specifically as the implementation requires; |
| 24 | + `::Function` and `::Matrix{Float64}` over-restriction is a smell. |
| 25 | + |
| 26 | +## Release strategy |
| 27 | +- **Pre-breaking-release**: `n/a (no breaking changes planned)` — the |
| 28 | + recommended approach (CHUNK-003) adds a new internal entrypoint rather than |
| 29 | + altering `nmfmerge`. Revisit if CHUNK-003 chooses to refactor the public API. |
| 30 | +- **Inter-cluster releases**: `n/a` |
| 31 | + |
| 32 | +## Background: the `--trim` build |
| 33 | + |
| 34 | +A first `juliac --trim=safe` build was run 2026-05-21 with Julia 1.13.0-rc1 |
| 35 | +(`julia +rc`; `juliac` at `~/.julia/bin/juliac`). It loaded every dependency, |
| 36 | +ran the trim verifier, and emitted a clean `build/nmfmerge_abi.json` — then |
| 37 | +failed with **92 verifier errors**, followed by a `juliac` crash in its own |
| 38 | +failure path (`UndefVarError(:exit)` in `Base.Compiler`). |
| 39 | + |
| 40 | +Build command (run from the package root): |
| 41 | + |
| 42 | +```bash |
| 43 | +juliac --output-lib build/libNMFMerge --export-abi build/nmfmerge_abi.json \ |
| 44 | + --project language_wrappers --compile-ccallable --trim=safe --experimental \ |
| 45 | + language_wrappers/lib.jl |
| 46 | +``` |
| 47 | + |
| 48 | +Build harness gotcha: `juliac` copies the `--project` into `/tmp`, so a |
| 49 | +relative `[sources]` path-dep (`NMFMerge = {path = ".."}`) cannot be resolved. |
| 50 | +`language_wrappers/Manifest.toml` was hand-edited to an absolute path as a |
| 51 | +workaround; `language_wrappers/Project.toml` is unchanged (relative). Re-running |
| 52 | +the build needs that Manifest edit (or an equivalent fix). |
| 53 | + |
| 54 | +Error taxonomy (the 92): |
| 55 | + |
| 56 | +- **~75% are diagnostic output**, dynamically dead but statically reachable: |
| 57 | + NMF's verbose `Printf` iteration tables (~55), `NonNegLeastSquares.warn` |
| 58 | + (8), TSVD's `biLanczos` `debug` `println` (~4). |
| 59 | +- **Constant propagation is lost through `NMF.nnmf`'s keyword interface.** |
| 60 | + NMFMerge calls `nnmf(X, n2; alg, kwargs..., init=:custom, …)`; the |
| 61 | + `kwargs...` splat routed through `Core.kwcall` defeats constprop of `init` |
| 62 | + and `alg`, so the verifier compiles *every* init method (`randinit`, |
| 63 | + `nndsvd`, `spa`) and *every* algorithm (ProjectedALS, ALSPGrad, MultUpdate, |
| 64 | + CoordinateDescent, GreedyCD) — ~6× the necessary surface. Evidence: error #1 |
| 65 | + is `randinit` (the `:random` path, never used); errors reference both |
| 66 | + `GreedyCDUpd` and `ProjectedALSUpd`. |
| 67 | +- **Genuine type instabilities** (minority): NMF's `zeros(T::DataType,…)` / |
| 68 | + `rand(T::DataType,…)` return `Matrix`, not `Matrix{T}`; GsvdInitialization's |
| 69 | + `Kronecker`+`SparseArrays` `hvcat`/`vcat` and `nonneg_lsq`'s `:fnnls` |
| 70 | + Symbol-keyword dispatch. |
| 71 | +- **NMFMerge's own code** contributes a small cluster: `colmerge2to1pq` / |
| 72 | + `pqupdate2to1!` (`src/NMFMerge.jl:127`, `:137`, `:140`), including the |
| 73 | + `::Function` annotation on `pqupdate2to1!`'s `queuepenalty` argument. |
| 74 | +- One error is a `juliac` limitation: `Core.invoke_in_world` — "trim |
| 75 | + verification not yet implemented for builtin". |
| 76 | + |
| 77 | +## Decisions |
| 78 | +<!-- Answers to `decide` chunks land here, with the chunk ID. --> |
| 79 | + |
| 80 | +## Chunks |
| 81 | + |
| 82 | +### CHUNK-001: preflight |
| 83 | +- **Kind**: `preflight` |
| 84 | +- **Description**: Establish baseline — `git status` clean, `Pkg.test()` passes, `Test.detect_ambiguities(NMFMerge)` count, current `Project.toml` version. Additionally re-run the `--trim` build (see Background) and record the verifier error count as the trim baseline. Record all results in this chunk's Notes. |
| 85 | +- **Status**: `not-started` |
| 86 | +- **Notes**: |
| 87 | + |
| 88 | +### CHUNK-002: decide-algorithm-to-pin |
| 89 | +- **Kind**: `decide` |
| 90 | +- **Description**: The compiled library must commit to a single NMF algorithm (the cause of the constprop blow-up is forwarding `alg` as a runtime value). `lib.jl` currently lets it default to `:greedycd`; NMFMerge's own tests almost all use `:cd` (CoordinateDescent). Decide which algorithm the compiled entrypoint pins. Recommendation: `:cd`, matching the tested path and `runtests.jl:121`. |
| 91 | +- **Status**: `not-started` |
| 92 | +- **Notes**: |
| 93 | + |
| 94 | +### CHUNK-003: decide-entrypoint-shape |
| 95 | +- **Kind**: `decide` |
| 96 | +- **Description**: Decide whether to (A) refactor the public `nmfmerge` to drop the `nnmf`/`kwargs...` path, or (B) add a separate internal monomorphic entrypoint (e.g. `_nmfmerge_compiled`) that `lib.jl` calls, leaving `nmfmerge` untouched. Recommendation: B — keeps the flexible API for interactive Julia users and keeps this plan non-breaking. Choosing A makes CHUNK-005 `Breaking: yes` and requires appending a `version-bump` chunk. |
| 97 | +- **Status**: `not-started` |
| 98 | +- **Notes**: |
| 99 | + |
| 100 | +### CHUNK-004: decide-upstream-vs-local |
| 101 | +- **Kind**: `decide` |
| 102 | +- **Description**: The trim-cleanliness fixes for NMF, TSVD, NonNegLeastSquares, and GsvdInitialization (CHUNK-010 .. CHUNK-013) live in other repos. Decide per-package: submit an upstream PR, or carry a local patch (dev'd package / fork pinned in `language_wrappers`). Recommendation: upstream PRs for all four — they are mechanical and broadly useful; fall back to a pinned fork only if a maintainer is unresponsive. |
| 103 | +- **Status**: `not-started` |
| 104 | +- **Notes**: |
| 105 | + |
| 106 | +### CHUNK-005: monomorphic-nmf-path |
| 107 | +- **Kind**: `implement` |
| 108 | +- **Description**: Give NMFMerge a monomorphic NMF path that does not go through `NMF.nnmf`'s keyword interface. Replace the four `nnmf(...)` calls in `nmfmerge` (`src/NMFMerge.jl:47,52,56` and the initial call) with direct `NMF.solve!(NMF.CoordinateDescent{Float64}(; maxiter, tol, …), X, W, H)` calls (per the pinned algorithm from CHUNK-002), performing `:custom` initialization by hand and dropping the `kwargs...` splat. Shape per CHUNK-003: new internal entrypoint (B) or in-place refactor (A). This is the highest-leverage chunk — it should prune five of six algorithms and the unused init methods. Verification: `Pkg.test()` still passes, and the `--trim` build's error count drops sharply. |
| 109 | +- **Status**: `not-started` |
| 110 | +- **Notes**: |
| 111 | +- **Depends on**: CHUNK-002, CHUNK-003 |
| 112 | + |
| 113 | +### CHUNK-006: fix-nmfmerge-own-trim-errors |
| 114 | +- **Kind**: `implement` |
| 115 | +- **Description**: Fix NMFMerge's own verifier errors in `colmerge2to1pq` / `pqupdate2to1!` (`src/NMFMerge.jl:110-153`). Remove the abstract `::Function` annotation on `pqupdate2to1!`'s `queuepenalty` argument; stabilize the merge-loop result types and the `reduce(hcat, …)` at `:140`. Verification: `Pkg.test()` passes; these errors gone from the trim log. |
| 116 | +- **Status**: `not-started` |
| 117 | +- **Notes**: |
| 118 | + |
| 119 | +### CHUNK-007: modernize-makefile |
| 120 | +- **Kind**: `implement` |
| 121 | +- **Description**: `language_wrappers/Makefile` uses the obsolete `juliac` interface (`--output-o`, a `.a` archive, `bindinginfo_*.log`). Rewrite it around the current flow: `juliac --output-lib --export-abi --compile-ccallable --trim=safe --experimental` (see Background for the exact command). Resolve the relative-`[sources]`-path issue robustly rather than relying on the hand-edited Manifest. |
| 122 | +- **Status**: `not-started` |
| 123 | +- **Notes**: |
| 124 | + |
| 125 | +### CHUNK-008: harden-lib-entrypoint |
| 126 | +- **Kind**: `implement` |
| 127 | +- **Description**: `language_wrappers/lib.jl`'s `nmfmerge_inplace` validates `Wout.rows == X.rows` and `Hout.cols == X.cols` but not `Wout.cols` / `Hout.rows` against the component count — a mis-sized output throws `DimensionMismatch` across the C ABI. Add the missing checks and decide an error-propagation convention (status field / error code) so failures surface as something a Python/C caller can inspect rather than a process abort. Reference the fail-fast value in Stated values. |
| 128 | +- **Status**: `not-started` |
| 129 | +- **Notes**: |
| 130 | + |
| 131 | +### CHUNK-009: reinventory-residual-errors |
| 132 | +- **Kind**: `investigate` |
| 133 | +- **Description**: After the NMFMerge-side fixes, re-run the `--trim` build and produce an accurate inventory of the residual errors, attributed per upstream package. Use it to tighten the scope (and dependency order) of CHUNK-010 .. CHUNK-013 — some may shrink dramatically once only the pinned algorithm's code path is compiled. Record the residual count and per-package breakdown in Notes. |
| 134 | +- **Status**: `not-started` |
| 135 | +- **Notes**: |
| 136 | +- **Depends on**: CHUNK-005, CHUNK-006 |
| 137 | + |
| 138 | +### CHUNK-010: trim-clean-nmf |
| 139 | +- **Kind**: `implement` |
| 140 | +- **Description**: Make NMF.jl trim-clean for the pinned algorithm's path: guard the verbose `Printf` iteration-table output so it is statically eliminable when `verbose=false`, and fix the `zeros(T::DataType,…)` / `rand(T::DataType,…)` type instability in `randinit` (`NMF/src/initialization.jl:5-7`). Delivery per CHUNK-004. |
| 141 | +- **Status**: `not-started` |
| 142 | +- **Notes**: |
| 143 | +- **Depends on**: CHUNK-009, CHUNK-004 |
| 144 | +- **Cluster**: upstream-trim |
| 145 | + |
| 146 | +### CHUNK-011: trim-clean-tsvd |
| 147 | +- **Kind**: `implement` |
| 148 | +- **Description**: Make TSVD.jl trim-clean: the `debug` `println` in `biLanczos` (`TSVD/src/svd.jl:177`) is statically reachable. Guard it so it is eliminable when `debug=false`. Delivery per CHUNK-004. |
| 149 | +- **Status**: `not-started` |
| 150 | +- **Notes**: |
| 151 | +- **Depends on**: CHUNK-009, CHUNK-004 |
| 152 | +- **Cluster**: upstream-trim |
| 153 | + |
| 154 | +### CHUNK-012: trim-clean-nonneglsq |
| 155 | +- **Kind**: `implement` |
| 156 | +- **Description**: Make NonNegLeastSquares.jl trim-clean: the `warn(...)` calls on unrecognized algorithm variants are statically reachable. Guard or restructure them. Delivery per CHUNK-004. |
| 157 | +- **Status**: `not-started` |
| 158 | +- **Notes**: |
| 159 | +- **Depends on**: CHUNK-009, CHUNK-004 |
| 160 | +- **Cluster**: upstream-trim |
| 161 | + |
| 162 | +### CHUNK-013: fix-gsvdinit-type-instability |
| 163 | +- **Kind**: `implement` |
| 164 | +- **Description**: Fix GsvdInitialization's genuine type instabilities surfaced by the verifier: the `Kronecker`+`SparseArrays` `hvcat`/`vcat` path in `gram_sp_C`, and the `nonneg_lsq` `Core.kwcall((alg=:fnnls, gram=true), …)` Symbol-keyword dispatch (`GsvdInitialization/src/GsvdInitialization.jl:~128,199-203`). These are algorithmic, not diagnostic — they need real type-stability work. Delivery per CHUNK-004. |
| 165 | +- **Status**: `not-started` |
| 166 | +- **Notes**: |
| 167 | +- **Depends on**: CHUNK-009, CHUNK-004 |
| 168 | +- **Cluster**: upstream-trim |
| 169 | + |
| 170 | +### CHUNK-014: report-juliac-bugs |
| 171 | +- **Kind**: `investigate` |
| 172 | +- **Description**: Report two `juliac`/Julia-1.13-rc findings upstream (JuliaC.jl / julia): (1) the crash in `juliac`'s failure path — `UndefVarError(:exit)` in `Base.Compiler` after the verifier reports errors; (2) `Core.invoke_in_world` — "trim verification not yet implemented for builtin". Record issue links in Notes. |
| 173 | +- **Status**: `not-started` |
| 174 | +- **Notes**: |
| 175 | + |
| 176 | +### CHUNK-015: end-to-end-verification |
| 177 | +- **Kind**: `investigate` |
| 178 | +- **Description**: Terminal verification: confirm the `--trim` build completes cleanly and emits `libNMFMerge` + `nmfmerge_abi.json`; run that JSON through JuliaLibWrapping (`write_wrapper` for `CTarget` and `PythonTarget`); smoke-test the generated C header and Python package against the compiled library. Record what works and any remaining friction. |
| 179 | +- **Status**: `not-started` |
| 180 | +- **Notes**: |
| 181 | +- **Depends on**: CHUNK-005, CHUNK-006, CHUNK-007, CHUNK-008, CHUNK-010, CHUNK-011, CHUNK-012, CHUNK-013 |
| 182 | + |
| 183 | +## Session ledger |
| 184 | +<!-- The implementer appends one line after each session: `- YYYY-MM-DD CHUNK-XXX (name) → next: CHUNK-YYY` --> |
| 185 | + |
| 186 | +## Open Questions |
| 187 | +- Coordination: once CHUNK-010 .. CHUNK-013 land as upstream releases, NMFMerge's |
| 188 | + `[compat]` bounds must be bumped to require the trim-clean versions, and |
| 189 | + NMFMerge itself likely needs a release. Sequence this after the |
| 190 | + `upstream-trim` cluster completes. |
| 191 | +- The compiled library carries the whole Julia runtime; shipping it to users |
| 192 | + without Julia (the actual Python/Matlab/R distribution story) is out of scope |
| 193 | + for this plan — track it with the JuliaLibWrapping work instead. |
0 commit comments