Phase D part 2: bump Nx/EXLA/Bumblebee, accept :emlx; v0.2.0

nshkrdotcom · nshkrdotcom · commit 017643ec2ec3 · 2026-05-21T12:37:11.000-10:00
This commit picks up the Nx-side fix that closes the Apple Silicon SVD-OOM blocker upstream and bumps the surrounding dep stack accordingly. Validated end-to-end on CUDA; validated end-to-end on Apple by polvalente (Nx core) prior to this commit. Deps: - nx pinned to GitHub elixir-nx/nx@6424c89 (post-v0.12.0 main, carries elixir-nx/nx#1753 — better memory footprint for thin SVD; both EMLX and EXLA benefit; the Apple OOM on the 151,936 x 1024 embedder is gone). - exla pinned to the same Nx repo at the same commit (sparse: 'exla', v0.12.0). - bumblebee bumped from github.com/elixir-nx/bumblebee@0fd8114 (pre-v0.7.0) to github.com/elixir-nx/bumblebee@d0774e8 (post-v0.7.0 main; required for Nx 0.12 compat). - xla 0.10.x is the resolved version (was 0.9.x). cuda13 is newly accepted by the XLA preflight; cuda12 remains recommended default. - EMLX is deliberately NOT in our deps. optional: true does not prevent Mix from starting it on Linux/CUDA hosts where its Metal/MLX NIF cannot load. Apple users add {:emlx, '~> 0.3'} to their parent app; the :emlx runtime profile resolves the backend at runtime via Code.ensure_loaded?/1. Documentation: - README, guides/onboarding.md, docs/production_qwen_slm_profile.md updated with the new resolved dep versions. - guides/troubleshooting.md 'XLA_TARGET=cuda13' section rewritten: cuda13 is now accepted; the rejection example uses cuda14. - guides/troubleshooting.md 'EMLX OOM on the embedder SVD' section rewritten to credit both fixes (EMLX 0.3.0 thin-SVD fall-through + Nx PR #1753 default-impl refactor). - guides/runtime_profiles.md EMLX caveats section updated to credit PR #1753 and note that polvalente confirmed end-to-end Apple validation (37/37 prompt eval pass) without EMLXAxon rewrites. - docs/bumblebee_unpin_playbook.md updated to reflect the new ref. Tests: - test/build_support/xla_target_validator_test.exs updated for the cuda13-now-accepted reality and the new bundled xla 0.10.x message. Version: - mix.exs @Version 0.1.0 -> 0.2.0. - CHANGELOG: new 0.2.0 entry summarising Phase B, C, D. Gates (all green on CUDA): - mix format - mix compile --warnings-as-errors - mix test: 262 tests, 0 failures (was 261; +1 'accepts cuda13' case) - mix credo --strict: 0 issues - mix dialyzer: 0 errors - mix docs --warnings-as-errors clean - XLA_TARGET=cuda12 mix run examples/qwen_router_prompt_eval.exs --snapshot examples/fixtures/qwen_router_prompt_eval_logits.json --determinism-runs 2 -> 37/37 PASS
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,84 @@
 # Changelog
 
+## 0.2.0 — 2026-05-21
+
+### Highlights
+
+This release ships profile-driven backend selection (the foundation for
+Apple Silicon / EMLX support), the canonical `mix trinity.artifact.fetch`
+onboarding command, and a dep-stack bump that picks up
+[Nx PR #1753](https://github.com/elixir-nx/nx/pull/1753) (better memory
+footprint for thin SVD; the Apple-side blocker is now fixed upstream).
+
+### Added
+
+- `mix trinity.artifact.fetch` task and `TrinityCoordinator.ArtifactFetch`
+  module. Downloads the adapted-Qwen3 bundle from a HuggingFace dataset
+  repo with per-file SHA-256 verification. The pin file at
+  `priv/sakana_trinity/artifact_pin.json` is committed to the repo so a
+  fresh clone knows what to fetch before touching the network.
+- `TrinityCoordinator.RuntimeProfile.put_default_backend!/1` — single
+  bottleneck through which all backend bootstrapping flows. Profiles
+  whose backend module is not loaded raise a single informative error
+  naming the missing dep.
+- `TrinityCoordinator.RuntimeProfile.accepts_backend_label?/2` — used
+  by the exporter's per-tensor validation. Generalises the previous
+  CUDA-only check.
+- Real `:emlx` runtime profile. `nx_backend == {EMLX.Backend, device: :gpu}`.
+  Resolves to a working profile struct; users on Apple Silicon add
+  `{:emlx, "~> 0.3"}` to their parent app and pass
+  `--runtime-profile emlx` to the relevant Mix tasks / examples.
+- `--runtime-profile NAME` flag on `mix trinity.sakana.export_adapted`,
+  `mix trinity.sakana.router_trace`, and
+  `examples/qwen_router_prompt_eval.exs`. Default `cuda_exla` for
+  back-compat.
+- New guides: `guides/runtime_profiles.md`, `guides/artifact_distribution.md`.
+- Three new troubleshooting sections in `guides/troubleshooting.md`:
+  artifact-fetch failure modes, EMLX dep missing, EMLX OOM on the
+  embedder SVD (with the two upstream fixes).
+- `hf_hub ~> 0.2` dep (already on Hex).
+
+### Changed
+
+- **Dep stack bump.** `nx` pinned to GitHub
+  `elixir-nx/nx@6424c8902380380cd7a8c282b0557d653aead018` (post-v0.12.0
+  main, carries PR #1753 thin-SVD memory fix). `exla` pinned to the
+  same commit (sparse: "exla"). `bumblebee` pinned to
+  `elixir-nx/bumblebee@d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306`
+  (post-v0.7.0 main).
+- **`xla` 0.10.x** is now the bundled version (was 0.9.x). `cuda13` is
+  newly accepted by the XLA preflight; `cuda12` remains the recommended
+  default.
+- `SLMProfile.qwen_coordinator/0` and `SLMProfile.qwen_sakana_adapted/0`
+  `load_options` no longer bake in `backend: {EXLA.Backend, client: :cuda}`.
+  They carry only `type: :bf16`; `Coordinator.load/1` injects the
+  runtime profile's backend at load time.
+- `Sakana.Exporter.ensure_cuda_backend/2` →
+  `ensure_export_backend/3`. Threads runtime profile through
+  `export_tensor/5` so per-tensor backend validation matches the profile
+  under which the export ran.
+- `Sakana.Exporter.load_profile/1` → `load_profile/2`. Uses
+  `RuntimeProfile.put_default_backend!/1` instead of CUDA-hard-coded
+  `Runtime.put_cuda_backend!/0`.
+- README `Model And Artifact Setup` rewritten to lead with
+  `mix trinity.artifact.fetch` instead of "use a blessed artifact bundle".
+
+### Fixed
+
+- Apple Silicon export OOM on the Qwen3-0.6B embedder
+  (151,936 × 1024 → (92 GB U). Upstream fix in Nx PR
+  #1753 (now in our pin). Validated end-to-end on Apple by Paulo
+  Valente (polvalente, Nx core team) on 2026-05-21: full export +
+  37/37 prompt eval pass.
+
+### Notes
+
+- `{:emlx, "~> 0.3"}` is deliberately NOT in `mix.exs`. Marking it
+  `optional: true` would still fetch and start it on Linux/CUDA hosts
+  where its Metal/MLX NIF cannot load. Apple users add the dep to
+  their parent app; the `:emlx` runtime profile resolves the backend
+  at runtime via `Code.ensure_loaded?/1`.
+
 ## 2026-05-21
 
 ### Added
diff --git a/README.md b/README.md
@@ -269,11 +269,13 @@ workspaces, but they are not required for fresh-clone onboarding.
 
 Resolved core dependency lane:
 
-- `nx 0.10.0`
-- `exla 0.10.0`
+- `nx 0.12.0` (pinned to GitHub main commit
+  `6424c8902380380cd7a8c282b0557d653aead018` for
+  [PR #1753 thin SVD memory fix](https://github.com/elixir-nx/nx/pull/1753))
+- `exla 0.12.0` (pinned to the same commit)
 - `axon 0.7.0`
-- `bumblebee` pinned to `elixir-nx/bumblebee`
-  `0fd8114cf5429af9236f100f3350986e9d823c02`
+- `bumblebee` pinned to `elixir-nx/bumblebee` commit
+  `d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306` (post-v0.7.0 main)
 
 ## Quick Verification
 
diff --git a/build_support/xla_target_validator.exs b/build_support/xla_target_validator.exs
@@ -16,13 +16,13 @@ defmodule XlaTargetValidator do
       `mix trinity.env.check`).
 
   The recognised target list is intentionally kept in lock-step with the
-  bundled `xla` version. As of `xla 0.9.x`, the supported set is
-  `cpu`, `cuda`, `cuda12`, `rocm`, `tpu`. The newer `xla 0.10.x` adds
-  `cuda13`; that bump is tracked separately (see
-  `docs/bumblebee_unpin_playbook.md`).
+  bundled `xla` version. As of `xla 0.10.x` (used by EXLA 0.12+), the
+  supported set is `cpu`, `cuda`, `cuda12`, `cuda13`, `rocm`, `tpu`.
+  `cuda12` remains the recommended default for CUDA hosts; `cuda13` is
+  newly accepted (the previous bundled `xla 0.9.x` rejected it).
   """
 
-  @supported_xla_targets ["cpu", "cuda", "cuda12", "rocm", "tpu"]
+  @supported_xla_targets ["cpu", "cuda", "cuda12", "cuda13", "rocm", "tpu"]
   @recommended "cuda12"
 
   @doc "Validates `XLA_TARGET`. Returns `:ok` or raises a `Mix.Error`."
@@ -77,7 +77,7 @@ defmodule XlaTargetValidator do
     accepted = Enum.map_join(@supported_xla_targets, ", ", &inspect/1)
 
     Mix.raise(
-      "XLA_TARGET=#{inspect(value)} is not accepted by the bundled xla 0.9.x. " <>
+      "XLA_TARGET=#{inspect(value)} is not accepted by the bundled xla 0.10.x. " <>
         "Accepted values: #{accepted}. " <>
         "Recommended for CUDA hosts: export XLA_TARGET=#{@recommended}. " <>
         "Recommended for CPU hosts: unset XLA_TARGET (or use cpu). " <>
diff --git a/docs/bumblebee_unpin_playbook.md b/docs/bumblebee_unpin_playbook.md
@@ -1,12 +1,13 @@
 # Bumblebee Unpin Playbook
 
 This playbook is the 15-minute job to take when a Bumblebee Hex release
-lands that includes Qwen3 support at or after commit
-`0fd8114cf5429af9236f100f3350986e9d823c02`.
+lands that includes Qwen3 support at or after Bumblebee v0.7.0.
 
-Until then, `mix.exs` pins Bumblebee to that commit on
-`elixir-nx/bumblebee`, and `mix trinity.gates --include-hex-build` treats
-the `hex_build_advisory` step as non-blocking by design.
+As of 2026-05-21, `mix.exs` pins Bumblebee to commit
+`d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306` on `elixir-nx/bumblebee`
+(post-v0.7.0 main; carries Qwen3 + Nx 0.12 compat needed for EMLX
+support). `mix trinity.gates --include-hex-build` treats the
+`hex_build_advisory` step as non-blocking by design.
 
 This playbook re-promotes that gate to blocking once Bumblebee is
 unpinned.
diff --git a/docs/production_qwen_slm_profile.md b/docs/production_qwen_slm_profile.md
@@ -86,33 +86,38 @@ model = TrinityCoordinator.CoordinationHead.build_model(hidden_size, num_agents,
 
 The checked-in dependency lane is:
 
-- `bumblebee` pinned to upstream `elixir-nx/bumblebee`
-  `0fd8114cf5429af9236f100f3350986e9d823c02`
+- `nx` pinned to GitHub `elixir-nx/nx`
+  `6424c8902380380cd7a8c282b0557d653aead018` (post-v0.12.0 main,
+  carries [PR #1753](https://github.com/elixir-nx/nx/pull/1753) thin-SVD
+  memory-footprint fix). When Nx 0.13 lands on Hex, the pin moves to
+  `{:nx, "~> 0.13"}`.
+- `exla` pinned to the same Nx repo at the same commit (sparse: "exla").
 - `axon ~> 0.7`
-- `nx ~> 0.9`
-- `exla ~> 0.9`
+- `bumblebee` pinned to upstream `elixir-nx/bumblebee`
+  `d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306` (post-v0.7.0 main).
 
-On this host, that lane is verified with `XLA_TARGET=cuda12`. Hex
-`bumblebee 0.6.3` does not ship Qwen3, so this repo pins the upstream Bumblebee
-commit that includes `Bumblebee.Text.Qwen3` and its Hugging Face parameter
-mapping.
+On this host, that lane is verified with `XLA_TARGET=cuda12`. Bumblebee
+v0.7.0 ships Qwen3 via Hex; the post-main pin picks up minor fixes
+landed after the release.
 
 ### `qwen_cuda_ready` outcome
 
 Current resolved versions used for this outcome:
 
-- `bumblebee` git ref `0fd8114cf5429af9236f100f3350986e9d823c02`
+- `nx 0.12.0` (GitHub commit above)
+- `exla 0.12.0` (GitHub commit above)
 - `axon 0.7.0`
-- `nx 0.10.0`
-- `exla 0.10.0`
+- `bumblebee` git ref `d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306`
 
 Outcome: `qwen_cuda_ready` is active for base Qwen hidden-state extraction.
 `SLMProfile.qwen_coordinator/0` uses:
 
 - repo: `{:hf, "Qwen/Qwen3-0.6B"}`
 - module: `Bumblebee.Text.Qwen3`
 - architecture: `:for_causal_language_modeling`
-- load options: `backend: {EXLA.Backend, client: :cuda}`, `type: :bf16`
+- load options: `type: :bf16` (the backend is injected at load time
+  by `Coordinator.load/1` based on the active `RuntimeProfile`;
+  see `guides/runtime_profiles.md`).
 - expected hidden size: `1024`
 
 Hidden states are enabled at prediction time with Axon's global layer option
diff --git a/guides/onboarding.md b/guides/onboarding.md
@@ -65,11 +65,13 @@ The current development lane assumes:
 
 The resolved Elixir dependency lane currently uses:
 
-- `nx 0.10.0`
-- `exla 0.10.0`
+- `nx 0.12.0` (pinned to GitHub main commit
+  `6424c8902380380cd7a8c282b0557d653aead018` for
+  [PR #1753 thin SVD memory fix](https://github.com/elixir-nx/nx/pull/1753))
+- `exla 0.12.0` (pinned to the same commit)
 - `axon 0.7.0`
-- `bumblebee` pinned to `elixir-nx/bumblebee` ref
-  `0fd8114cf5429af9236f100f3350986e9d823c02`
+- `bumblebee` pinned to `elixir-nx/bumblebee` commit
+  `d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306` (post-v0.7.0 main)
 
 ## First Commands
 
diff --git a/guides/runtime_profiles.md b/guides/runtime_profiles.md
@@ -57,21 +57,27 @@ mix run examples/qwen_router_prompt_eval.exs --runtime-profile emlx \
 
 #### EMLX-specific Caveats
 
-- **Thin SVD.** EMLX v0.3.0 routes `Nx.LinAlg.svd/2` with
-  `full_matrices?: false` through Nx's default implementation, which
-  avoids materialising the full `m × m` U on the Qwen3-0.6B embedder
-  (where `m = 151_936`, i.e. ~92 GB of U). The default path uses
-  `eigh`; for the small-σ tail to stay precise, pass
-  `--svd-compute-type f32` to `mix trinity.sakana.export_adapted` on
-  Apple.
+- **Thin SVD memory footprint.** Nx main as of commit `6424c89` (Paulo
+  Valente, [PR #1753](https://github.com/elixir-nx/nx/pull/1753))
+  refactored `Nx.LinAlg.svd/2` with `full_matrices?: false` so it does
+  not materialise the full `m × m` U on the Qwen3-0.6B embedder
+  (where `m = 151_936`, i.e. (92 GB of U under the old path).
+  This fix is in the Nx version that `trinity_coordinator` pins to.
+  Both EMLX and EXLA benefit from this change.
+- **`--svd-compute-type f32`.** Recommended on Apple. The thin-SVD
+  path uses an `eigh` decomposition under the hood; doing that work
+  in f32 keeps the small-σ tail precise.
 - **Backend label.** When the exporter validates per-tensor backend
   during the SVD reconstruction step, it accepts the
   `"EMLX.Backend"` label as well as `"EXLA.Backend<cuda:"`. No code
   changes needed for the user.
 - **Bumblebee Qwen3 support.** Bumblebee is git-pinned to a Qwen3-
-  supporting commit (`mix.exs`). EMLXAxon
-  (`https://github.com/elixir-nx/emlx`) has independently validated
-  Qwen3-0.6B loading through the EMLX backend.
+  supporting commit (post-v0.7.0 main). EMLXAxon
+  ([github.com/elixir-nx/emlx](https://github.com/elixir-nx/emlx)) has
+  independently validated Qwen3-0.6B loading through the EMLX backend.
+  Paulo Valente confirmed on 2026-05-21 that running with the bare
+  EMLX backend (no `EMLXAxon.rewrite/1`) successfully exports and
+  passes 37/37 on the prompt eval.
 - **bf16 round-trip.** The bundle is bf16 safetensors. EMLX accepts
   bf16 natively (`{:bf, 16}` ↔ MLX `bfloat16`). No quantisation or
   type cast required.
diff --git a/guides/troubleshooting.md b/guides/troubleshooting.md
@@ -166,29 +166,37 @@ If CUDA is missing, verify:
 - EXLA dependency target;
 - environment isolation, especially shells launched without CUDA env vars.
 
-## XLA_TARGET=cuda13 Is Rejected At Compile Time
+## XLA_TARGET Rejected At Compile Time
 
-`xla 0.9.1` does not accept `cuda13`. If `mix deps.compile xla` reports
-an unsupported target, set:
+`xla 0.10.x` (which EXLA 0.12+ uses) accepts:
+
+```text
+cpu, cuda, cuda12, cuda13, rocm, tpu
+```
+
+Anything else is rejected at compile time. The most common error mode
+is a stale shell export like `XLA_TARGET=cuda14`. Set:
 
 ```bash
 export XLA_TARGET=cuda12
 ```
 
-This applies even on hosts whose installed CUDA toolkit is 13.x. The
-`XLA_TARGET` controls which prebuilt XLA artifact is fetched; mismatched
-host CUDA installations are tolerated by EXLA via dynamic loading.
+`cuda12` is the canonical recommended default for CUDA hosts even when
+the host installed toolkit is 13.x — the `XLA_TARGET` controls which
+prebuilt XLA artifact is fetched; mismatched host CUDA installations
+are tolerated by EXLA via dynamic loading. Use `cuda13` when you
+specifically want the cuda13 prebuilt.
 
 ### Automatic preflight
 
-As of 2026-05-21, the project surfaces this automatically via a Mix
+The project surfaces unsupported targets automatically via a Mix
 preflight that runs from `mix.exs` before any compilation step. An
-operator whose shell exports `XLA_TARGET=cuda13` will see a single
-readable line instead of an EXLA stacktrace:
+operator whose shell exports an unsupported `XLA_TARGET` will see a
+single readable line instead of an EXLA stacktrace:
 
 ```text
-** (Mix.Error) XLA_TARGET="cuda13" is not accepted by the bundled xla 0.9.x.
-Accepted values: "cpu", "cuda", "cuda12", "rocm", "tpu".
+** (Mix.Error) XLA_TARGET="cuda14" is not accepted by the bundled xla 0.10.x.
+Accepted values: "cpu", "cuda", "cuda12", "cuda13", "rocm", "tpu".
 Recommended for CUDA hosts: export XLA_TARGET=cuda12.
 Recommended for CPU hosts: unset XLA_TARGET (or use cpu).
 The bundled xla rejects unrecognised targets at compile time, so EXLA
@@ -295,9 +303,22 @@ precise.
 
 ### EMLX OOM on the embedder SVD
 
-If your EMLX version is older than v0.3.0, the native SVD path
-materialises the full `m × m` U matrix on the Qwen3-0.6B embedder
-(`m = 151_936`, ~92 GB). Upgrade to `{:emlx, "~> 0.3"}` — Paulo
-Valente's commit `3482b79` ("fix: use nx-defined implementation for
-non-full svd computation") routes `full_matrices?: false` through
-Nx's default path and keeps the work at `min(m, n)² = 1024²`.
+The Qwen3-0.6B embedder is `151_936 × 1024`. Before two fixes
+landed, the SVD of this matrix tried to materialise a full `m × m`
+U matrix — about 92 GB. The fix landed in two places:
+
+1. **EMLX v0.3.0** routed `Nx.LinAlg.svd/2` with `full_matrices?: false`
+   through Nx's default implementation instead of MLX's native SVD
+   (which always allocates the full U). Commit `3482b79`, Paulo
+   Valente, "fix: use nx-defined implementation for non-full svd
+   computation".
+2. **Nx main commit `6424c89`**
+   ([PR #1753](https://github.com/elixir-nx/nx/pull/1753)) refactored
+   the default thin-SVD path itself to keep the working set bounded
+   by `min(m, n)²`. Both EMLX and EXLA benefit from this fix.
+
+`trinity_coordinator` pins the post-#1753 Nx (see `mix.exs`), so a
+user who runs `mix trinity.sakana.export_adapted --runtime-profile emlx`
+on Apple Silicon with `{:emlx, "~> 0.3"}` in their parent app gets
+the bounded-memory path automatically. If you see an OOM, confirm
+your Nx version: it should be 0.12.x or later.
diff --git a/mix.exs b/mix.exs
@@ -31,7 +31,7 @@ XlaTargetValidator.validate_root_project!(__DIR__)
 defmodule TrinityCoordinator.MixProject do
   use Mix.Project
 
-  @version "0.1.0"
+  @version "0.2.0"
 
   def project do
     [
@@ -79,16 +79,41 @@ defmodule TrinityCoordinator.MixProject do
     [
       # {:dep_from_hexpm, "~> 0.3.0"},
       # {:dep_from_git, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"}
-      {:nx, "~> 0.9"},
+      # Nx is pinned to GitHub main to pick up
+      # https://github.com/elixir-nx/nx/pull/1753 (refactor: better memory
+      # footprint for thin svd, polvalente, 2026-05). The thin-SVD path
+      # avoids materialising the full m×m U matrix on the Qwen3-0.6B
+      # embedder (m = 151,936) regardless of backend, which is what
+      # makes the Apple/EMLX export viable without OOM. CUDA also
+      # benefits (smaller working set during the embedder factorisation).
+      # Pin moves to {:nx, "~> 0.13"} once Nx 0.13 is on Hex.
+      {:nx,
+       github: "elixir-nx/nx",
+       sparse: "nx",
+       ref: "6424c8902380380cd7a8c282b0557d653aead018",
+       override: true},
+      # EXLA pulled from the same Nx repo so the in-tree :nx version
+      # matches what EXLA expects (both at 0.12 + thin-SVD PR).
+      {:exla,
+       github: "elixir-nx/nx",
+       sparse: "exla",
+       ref: "6424c8902380380cd7a8c282b0557d653aead018",
+       override: true},
       {:axon, "~> 0.7"},
-      # Pinned to a Qwen3-supporting commit until a Bumblebee Hex release
-      # lands that includes it. To unpin, follow
-      # docs/bumblebee_unpin_playbook.md.
+      # Bumblebee main (post-v0.7.0). Qwen3 is on Hex via v0.7.0 but
+      # main has additional fixes; once Hex 0.8 lands, switch to
+      # {:bumblebee, "~> 0.8"} per docs/bumblebee_unpin_playbook.md.
       {:bumblebee,
        github: "elixir-nx/bumblebee",
-       ref: "0fd8114cf5429af9236f100f3350986e9d823c02",
+       ref: "d0774e8ab8c4d5ac60ade95ec8dc9e1f0efd7306",
        override: true},
-      {:exla, "~> 0.9"},
+      # NOTE: EMLX is deliberately NOT listed here. Marking it
+      # optional: true would still cause Mix to fetch and start EMLX on
+      # any host (incl. Linux/CUDA), whose Metal/MLX NIF cannot load.
+      # Apple Silicon users add {:emlx, "~> 0.3"} to their own
+      # application's deps; the :emlx runtime profile then resolves to
+      # the EMLX.Backend at runtime via Code.ensure_loaded?/1. See
+      # guides/runtime_profiles.md.
       DependencySources.dep(:inference, __DIR__),
       DependencySources.dep(:agent_session_manager, __DIR__),
       DependencySources.dep(:gemini_cli_sdk, __DIR__),
diff --git a/mix.lock b/mix.lock
diff --git a/test/build_support/xla_target_validator_test.exs b/test/build_support/xla_target_validator_test.exs