Skip to content

Commit 0ab3ce6

Browse files
committed
Plan to fix issues
1 parent bbaed3d commit 0ab3ce6

2 files changed

Lines changed: 194 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@
66
/docs/build/
77
/language_wrappers/Manifest.toml
88
/language_wrappers/bindinginfo_libNMFMerge.log
9+
build/*

INTEGRATION_REVIEW_PLAN.md

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Integration Review Plan
2+
<!-- Hand-authored in the /review-integration plan schema. Edit freely, but preserve chunk IDs and status values. -->
3+
4+
## Metadata
5+
- **Kind**: `integration`
6+
- **Package**: NMFMerge
7+
- **Source review date**: 2026-05-21
8+
- **Current version**: 1.1.1
9+
- **Issue**: n/a
10+
11+
## Stated values
12+
The goal of this plan is to make NMFMerge compilable into a C-ABI shared
13+
library by `juliac --trim`, so it can be called from Python/Matlab/R (the
14+
`teh/wrappers` branch). Guiding preferences:
15+
16+
- Fail-fast over silent continuation: surfacing errors matters more than
17+
papering over them. Trimming strips error-message machinery, which is in
18+
tension with this — flag, do not silently accept, any loss of diagnosability.
19+
- Keep the flexible interactive `nmfmerge` API intact for Julia users; the
20+
compiled library may use a separate, deliberately monomorphic entrypoint.
21+
- Upstream trim-cleanliness fixes (NMF, TSVD, NonNegLeastSquares) are
22+
preferred over local vendored patches where the maintainers will accept them.
23+
- Annotate arguments only as specifically as the implementation requires;
24+
`::Function` and `::Matrix{Float64}` over-restriction is a smell.
25+
26+
## Release strategy
27+
- **Pre-breaking-release**: `n/a (no breaking changes planned)` — the
28+
recommended approach (CHUNK-003) adds a new internal entrypoint rather than
29+
altering `nmfmerge`. Revisit if CHUNK-003 chooses to refactor the public API.
30+
- **Inter-cluster releases**: `n/a`
31+
32+
## Background: the `--trim` build
33+
34+
A first `juliac --trim=safe` build was run 2026-05-21 with Julia 1.13.0-rc1
35+
(`julia +rc`; `juliac` at `~/.julia/bin/juliac`). It loaded every dependency,
36+
ran the trim verifier, and emitted a clean `build/nmfmerge_abi.json` — then
37+
failed with **92 verifier errors**, followed by a `juliac` crash in its own
38+
failure path (`UndefVarError(:exit)` in `Base.Compiler`).
39+
40+
Build command (run from the package root):
41+
42+
```bash
43+
juliac --output-lib build/libNMFMerge --export-abi build/nmfmerge_abi.json \
44+
--project language_wrappers --compile-ccallable --trim=safe --experimental \
45+
language_wrappers/lib.jl
46+
```
47+
48+
Build harness gotcha: `juliac` copies the `--project` into `/tmp`, so a
49+
relative `[sources]` path-dep (`NMFMerge = {path = ".."}`) cannot be resolved.
50+
`language_wrappers/Manifest.toml` was hand-edited to an absolute path as a
51+
workaround; `language_wrappers/Project.toml` is unchanged (relative). Re-running
52+
the build needs that Manifest edit (or an equivalent fix).
53+
54+
Error taxonomy (the 92):
55+
56+
- **~75% are diagnostic output**, dynamically dead but statically reachable:
57+
NMF's verbose `Printf` iteration tables (~55), `NonNegLeastSquares.warn`
58+
(8), TSVD's `biLanczos` `debug` `println` (~4).
59+
- **Constant propagation is lost through `NMF.nnmf`'s keyword interface.**
60+
NMFMerge calls `nnmf(X, n2; alg, kwargs..., init=:custom, …)`; the
61+
`kwargs...` splat routed through `Core.kwcall` defeats constprop of `init`
62+
and `alg`, so the verifier compiles *every* init method (`randinit`,
63+
`nndsvd`, `spa`) and *every* algorithm (ProjectedALS, ALSPGrad, MultUpdate,
64+
CoordinateDescent, GreedyCD) — ~6× the necessary surface. Evidence: error #1
65+
is `randinit` (the `:random` path, never used); errors reference both
66+
`GreedyCDUpd` and `ProjectedALSUpd`.
67+
- **Genuine type instabilities** (minority): NMF's `zeros(T::DataType,…)` /
68+
`rand(T::DataType,…)` return `Matrix`, not `Matrix{T}`; GsvdInitialization's
69+
`Kronecker`+`SparseArrays` `hvcat`/`vcat` and `nonneg_lsq`'s `:fnnls`
70+
Symbol-keyword dispatch.
71+
- **NMFMerge's own code** contributes a small cluster: `colmerge2to1pq` /
72+
`pqupdate2to1!` (`src/NMFMerge.jl:127`, `:137`, `:140`), including the
73+
`::Function` annotation on `pqupdate2to1!`'s `queuepenalty` argument.
74+
- One error is a `juliac` limitation: `Core.invoke_in_world` — "trim
75+
verification not yet implemented for builtin".
76+
77+
## Decisions
78+
<!-- Answers to `decide` chunks land here, with the chunk ID. -->
79+
80+
## Chunks
81+
82+
### CHUNK-001: preflight
83+
- **Kind**: `preflight`
84+
- **Description**: Establish baseline — `git status` clean, `Pkg.test()` passes, `Test.detect_ambiguities(NMFMerge)` count, current `Project.toml` version. Additionally re-run the `--trim` build (see Background) and record the verifier error count as the trim baseline. Record all results in this chunk's Notes.
85+
- **Status**: `not-started`
86+
- **Notes**:
87+
88+
### CHUNK-002: decide-algorithm-to-pin
89+
- **Kind**: `decide`
90+
- **Description**: The compiled library must commit to a single NMF algorithm (the cause of the constprop blow-up is forwarding `alg` as a runtime value). `lib.jl` currently lets it default to `:greedycd`; NMFMerge's own tests almost all use `:cd` (CoordinateDescent). Decide which algorithm the compiled entrypoint pins. Recommendation: `:cd`, matching the tested path and `runtests.jl:121`.
91+
- **Status**: `not-started`
92+
- **Notes**:
93+
94+
### CHUNK-003: decide-entrypoint-shape
95+
- **Kind**: `decide`
96+
- **Description**: Decide whether to (A) refactor the public `nmfmerge` to drop the `nnmf`/`kwargs...` path, or (B) add a separate internal monomorphic entrypoint (e.g. `_nmfmerge_compiled`) that `lib.jl` calls, leaving `nmfmerge` untouched. Recommendation: B — keeps the flexible API for interactive Julia users and keeps this plan non-breaking. Choosing A makes CHUNK-005 `Breaking: yes` and requires appending a `version-bump` chunk.
97+
- **Status**: `not-started`
98+
- **Notes**:
99+
100+
### CHUNK-004: decide-upstream-vs-local
101+
- **Kind**: `decide`
102+
- **Description**: The trim-cleanliness fixes for NMF, TSVD, NonNegLeastSquares, and GsvdInitialization (CHUNK-010 .. CHUNK-013) live in other repos. Decide per-package: submit an upstream PR, or carry a local patch (dev'd package / fork pinned in `language_wrappers`). Recommendation: upstream PRs for all four — they are mechanical and broadly useful; fall back to a pinned fork only if a maintainer is unresponsive.
103+
- **Status**: `not-started`
104+
- **Notes**:
105+
106+
### CHUNK-005: monomorphic-nmf-path
107+
- **Kind**: `implement`
108+
- **Description**: Give NMFMerge a monomorphic NMF path that does not go through `NMF.nnmf`'s keyword interface. Replace the four `nnmf(...)` calls in `nmfmerge` (`src/NMFMerge.jl:47,52,56` and the initial call) with direct `NMF.solve!(NMF.CoordinateDescent{Float64}(; maxiter, tol, …), X, W, H)` calls (per the pinned algorithm from CHUNK-002), performing `:custom` initialization by hand and dropping the `kwargs...` splat. Shape per CHUNK-003: new internal entrypoint (B) or in-place refactor (A). This is the highest-leverage chunk — it should prune five of six algorithms and the unused init methods. Verification: `Pkg.test()` still passes, and the `--trim` build's error count drops sharply.
109+
- **Status**: `not-started`
110+
- **Notes**:
111+
- **Depends on**: CHUNK-002, CHUNK-003
112+
113+
### CHUNK-006: fix-nmfmerge-own-trim-errors
114+
- **Kind**: `implement`
115+
- **Description**: Fix NMFMerge's own verifier errors in `colmerge2to1pq` / `pqupdate2to1!` (`src/NMFMerge.jl:110-153`). Remove the abstract `::Function` annotation on `pqupdate2to1!`'s `queuepenalty` argument; stabilize the merge-loop result types and the `reduce(hcat, …)` at `:140`. Verification: `Pkg.test()` passes; these errors gone from the trim log.
116+
- **Status**: `not-started`
117+
- **Notes**:
118+
119+
### CHUNK-007: modernize-makefile
120+
- **Kind**: `implement`
121+
- **Description**: `language_wrappers/Makefile` uses the obsolete `juliac` interface (`--output-o`, a `.a` archive, `bindinginfo_*.log`). Rewrite it around the current flow: `juliac --output-lib --export-abi --compile-ccallable --trim=safe --experimental` (see Background for the exact command). Resolve the relative-`[sources]`-path issue robustly rather than relying on the hand-edited Manifest.
122+
- **Status**: `not-started`
123+
- **Notes**:
124+
125+
### CHUNK-008: harden-lib-entrypoint
126+
- **Kind**: `implement`
127+
- **Description**: `language_wrappers/lib.jl`'s `nmfmerge_inplace` validates `Wout.rows == X.rows` and `Hout.cols == X.cols` but not `Wout.cols` / `Hout.rows` against the component count — a mis-sized output throws `DimensionMismatch` across the C ABI. Add the missing checks and decide an error-propagation convention (status field / error code) so failures surface as something a Python/C caller can inspect rather than a process abort. Reference the fail-fast value in Stated values.
128+
- **Status**: `not-started`
129+
- **Notes**:
130+
131+
### CHUNK-009: reinventory-residual-errors
132+
- **Kind**: `investigate`
133+
- **Description**: After the NMFMerge-side fixes, re-run the `--trim` build and produce an accurate inventory of the residual errors, attributed per upstream package. Use it to tighten the scope (and dependency order) of CHUNK-010 .. CHUNK-013 — some may shrink dramatically once only the pinned algorithm's code path is compiled. Record the residual count and per-package breakdown in Notes.
134+
- **Status**: `not-started`
135+
- **Notes**:
136+
- **Depends on**: CHUNK-005, CHUNK-006
137+
138+
### CHUNK-010: trim-clean-nmf
139+
- **Kind**: `implement`
140+
- **Description**: Make NMF.jl trim-clean for the pinned algorithm's path: guard the verbose `Printf` iteration-table output so it is statically eliminable when `verbose=false`, and fix the `zeros(T::DataType,…)` / `rand(T::DataType,…)` type instability in `randinit` (`NMF/src/initialization.jl:5-7`). Delivery per CHUNK-004.
141+
- **Status**: `not-started`
142+
- **Notes**:
143+
- **Depends on**: CHUNK-009, CHUNK-004
144+
- **Cluster**: upstream-trim
145+
146+
### CHUNK-011: trim-clean-tsvd
147+
- **Kind**: `implement`
148+
- **Description**: Make TSVD.jl trim-clean: the `debug` `println` in `biLanczos` (`TSVD/src/svd.jl:177`) is statically reachable. Guard it so it is eliminable when `debug=false`. Delivery per CHUNK-004.
149+
- **Status**: `not-started`
150+
- **Notes**:
151+
- **Depends on**: CHUNK-009, CHUNK-004
152+
- **Cluster**: upstream-trim
153+
154+
### CHUNK-012: trim-clean-nonneglsq
155+
- **Kind**: `implement`
156+
- **Description**: Make NonNegLeastSquares.jl trim-clean: the `warn(...)` calls on unrecognized algorithm variants are statically reachable. Guard or restructure them. Delivery per CHUNK-004.
157+
- **Status**: `not-started`
158+
- **Notes**:
159+
- **Depends on**: CHUNK-009, CHUNK-004
160+
- **Cluster**: upstream-trim
161+
162+
### CHUNK-013: fix-gsvdinit-type-instability
163+
- **Kind**: `implement`
164+
- **Description**: Fix GsvdInitialization's genuine type instabilities surfaced by the verifier: the `Kronecker`+`SparseArrays` `hvcat`/`vcat` path in `gram_sp_C`, and the `nonneg_lsq` `Core.kwcall((alg=:fnnls, gram=true), …)` Symbol-keyword dispatch (`GsvdInitialization/src/GsvdInitialization.jl:~128,199-203`). These are algorithmic, not diagnostic — they need real type-stability work. Delivery per CHUNK-004.
165+
- **Status**: `not-started`
166+
- **Notes**:
167+
- **Depends on**: CHUNK-009, CHUNK-004
168+
- **Cluster**: upstream-trim
169+
170+
### CHUNK-014: report-juliac-bugs
171+
- **Kind**: `investigate`
172+
- **Description**: Report two `juliac`/Julia-1.13-rc findings upstream (JuliaC.jl / julia): (1) the crash in `juliac`'s failure path — `UndefVarError(:exit)` in `Base.Compiler` after the verifier reports errors; (2) `Core.invoke_in_world` — "trim verification not yet implemented for builtin". Record issue links in Notes.
173+
- **Status**: `not-started`
174+
- **Notes**:
175+
176+
### CHUNK-015: end-to-end-verification
177+
- **Kind**: `investigate`
178+
- **Description**: Terminal verification: confirm the `--trim` build completes cleanly and emits `libNMFMerge` + `nmfmerge_abi.json`; run that JSON through JuliaLibWrapping (`write_wrapper` for `CTarget` and `PythonTarget`); smoke-test the generated C header and Python package against the compiled library. Record what works and any remaining friction.
179+
- **Status**: `not-started`
180+
- **Notes**:
181+
- **Depends on**: CHUNK-005, CHUNK-006, CHUNK-007, CHUNK-008, CHUNK-010, CHUNK-011, CHUNK-012, CHUNK-013
182+
183+
## Session ledger
184+
<!-- The implementer appends one line after each session: `- YYYY-MM-DD CHUNK-XXX (name) → next: CHUNK-YYY` -->
185+
186+
## Open Questions
187+
- Coordination: once CHUNK-010 .. CHUNK-013 land as upstream releases, NMFMerge's
188+
`[compat]` bounds must be bumped to require the trim-clean versions, and
189+
NMFMerge itself likely needs a release. Sequence this after the
190+
`upstream-trim` cluster completes.
191+
- The compiled library carries the whole Julia runtime; shipping it to users
192+
without Julia (the actual Python/Matlab/R distribution story) is out of scope
193+
for this plan — track it with the JuliaLibWrapping work instead.

0 commit comments

Comments
 (0)