All notable changes to this project are documented in this file. The format is loosely based on Keep a Changelog.
A weekly, async, multi-persona LLM retrospective that runs every Saturday at 01:00 and feeds its verdict back into the real-time consensus as the 11th weighted vote (9.5%). Adapted from 0xNyk/council-of-high-intelligence. See docs/ADR-003-weekend-council-11th-voice.md.
- 6 personas on 5 distinct model families for genuine reasoning diversity (not costume changes on one model): Le Stratège (Gemma 4 12B), Le Gestionnaire de Risque (GLM-4.6V-Flash), Le Quant (Qwen 3.5 9B), Le Sceptique (LFM 2.5 Mamba), Le Tacticien (Mistral Nemo 12B), Le Comportementaliste (LFM 2.5). The Judge runs on Qwen3.5-9B-MTP (IFEval 91.5, 262K context).
- 4-round protocol with anti-groupthink: (0) Problem Restate Gate, (1) targeted analysis with explicit
STANCE: BUY|SELL|HOLD (X%), dissent quota that forces a steelman if ≥2/3 majority converges, (2) 1-vs-1 directed debate, (3) Judge verdict (Unresolved-Questions-first). - Level 3 integration — the Judge emits a parseable
VERDICT_TICKER:block per ticker;get_council_ticker_stance()parses it and adds aModelDecision(model_name="council")at weight 0.10. Confidence decays linearly (full at day 0 → 0 at day 7). Council is exempt from the adaptive weight loop (fixed_weight_models) since its correctness can't be measured against per-cycle market direction. - Real PROD context — analyses model accuracy (
model_performance.db), portfolio metrics + critical alerts (performance_monitor.db), and the trading journal (bias detector), not a generic market feed. - Generous token budgets tuned for thinking models on CPU:
num_predictup to 12000 (Judge),num_ctxup to 65536, 15-min per-call Ollama timeout, 48-hour scheduler window (COUNCIL_TIMEOUT = 172800). - Setup:
uv run python setup_council_models.pyinstalls the 6 required models (~40 GB, idempotent).schedule_test.pyforces a one-off trigger for validation.
A thorough audit of the initial Level 3 commit found the council vote was inert in production. The fixes (commit 2ccd26c):
- Ticker mismatch (CRITICAL) — the call site passed
self.analysis_ticker(^NDX) but the verdict uses trading tickers (SXRV.DE); never matched → always skipped. Now passesself.ticker. - Freshness duplication — unified into
_load_fresh_council_report()(text-injection and vote previously had separate, drift-prone copies). - Adaptive weight drift —
fixed_weight_modelsexemption prevents the manager from silently rescaling the 0.10 weight via a neutral performance score. - Parser robustness — French decimal comma (
0,65), percent rescaling (85→0.85),rfindfor block isolation. - DB pollution — council excluded from outcome tracking (no resolvable outcome).
tests/test_weekend_council.py(22 tests): routing, dissent quota, vote tally, models footer, graceful degradation.tests/test_llm_client.py:TestCouncilTickerStance(8 tests) +TestStripThinkingDebris(4) +TestCouncilVerdictExtraction(5) +TestCouncilVerdictContext(4).tests/test_enhanced_decision_engine.py:TestCouncilVoteIntegration(4 tests) — confirms a council SELL vote measurably lowers the weighted score.- Full suite: 84 green, Ruff clean.
audit_prod_logs.py— new standalone auditor that validates all files inlogs_prod/(catalogue, SQLite integrity/row-counts, parquet freshness + June-2026 coverage, JSON search-query caches, pickle models + TensorTrade metadata), runs a corrected backtest against the prod cache (logs_prod/data_cache/, current — not the stale repo-rootdata_cache/), and a dedicated FinAcumen section (state-file analysis + deterministic tool-chain proof). Emitslogs_prod/audit_report.mdwith an OK/WARN/FAIL verdict.
FinAcumen (src/finacumen_main.py) converged to status: success after fixing six bugs in src/core/tools.py and src/agents/solver.py. Verified live with Ollama + gemma-4-12b: CRUDP.PA → HOLD 0.75, SXRV.DE → BUY 0.85, each citing real fetched prices (close, SMA50/200, RSI, MACD).
lookup_ohlcAPI mismatch — now acceptsindicator: str | list[str]; a list returns a{indicator: value}dict (the form the LLM always generated). A single string still returns a float (backward compatible).- Missing indicators — added
rsi(Wilder 14),sma_50,sma_200,ema_12,ema_26,macd, computed from the yfinance history. - Symbol mapping — aliases (
WTI,NASDAQ,NDX,BRENT,SP500) plus direct pass-through of any yfinance ticker (CRUDP.PA,SXRV.DE). - Sandbox
__import__ban —NumericalReasoningEnginenow pre-injectspd(pandas) andnp(numpy) so the LLM never needsimport(the__import__restriction is preserved for security). - Invisible fetched data — the solver observation now echoes the
datavariable when the LLM assignsdata = lookup_ohlc(...)withoutprint-ing it (the root cause of the prod fetch-loop timeouts). - Branch logic — the solver now distinguishes execute (
python_codenon-empty) from final answer (action in BUY|SELL|HOLD). Previously it always took the execute branch becausepython_codeis a mandatory schema key, so final answers were never accepted. - Solver prompt — documents the real
lookup_ohlcsignature, tells the LLM to call it once thenprint, then decide (no fetch loop).max_iterationsraised 5 → 6.
tests/test_finacumen.pyextended from 2 → 8 tests: regression coverage for all six bugs (list→dict, derived indicators, single-string backward compat, pd/np without import,__import__still blocked, full LLM-style code in the sandbox). Mocked suite 20/20 green.
- FinAcumen is repaired and verified converging. It is coupled to the main project via shared data:
main.pywritestrading_journal.csv/trading.log/performance_monitor.db, whichmorning_briefreads;schedule.pythen appends the FinAcumen section intomorning_market_brief.md. So FinAcumen influences the daily decision brief. It is not a per-cycle vote in the real-time consensus (enhanced_decision_engine.py/model_performance.db); wiring it as an 11th vote is a deliberate follow-up. backtest_prod.pystill reads the stale repo-rootdata_cache/; useaudit_prod_logs.pyfor a backtest against the prod cache.
- Think mode re-enabled on Gemma 4 12B. The
<|think|>token is now present in all four production system prompts (src/llm_client.py:188,src/llm_client.py:236,src/oil_bench_model.py:158,src/web_researcher.py:205). Restores the model's internal reasoning channel. - Dual-layer JSON defence documented as the load-bearing architecture for JSON-extraction safety:
- Layer 1 (load-bearing):
format: SCHEMA_*withadditionalProperties: false, enforced server-side by Ollama. - Layer 2 (belt-and-braces): defensive system-prompt suffix
"...never add a 'thought' key.".
- Layer 1 (load-bearing):
- Architecture Decision Record
docs/ADR-001-think-mode-dual-layer-defence.mdcapturing the rationale, validation evidence, and reversal procedure. - Top-of-file docstring in
tests/check_llm_json.pyupdated to clarify that*_v1_buggycases are now the production path, with two acceptable outcomes documented (all-OK or*_v1_buggyfail under loose format → negative-result validation).
README.md"Advanced Cognition" bullet: replaced the "disables the model's thinking mode" wording with the dual-layer defence explanation and a pointer to the ADR.memory-bank/techContext.md: thinking mode now described as enabled, with the two-layer defence explained.memory-bank/activeContext.md: new "Think Mode Re-enabled (2026-06-06)" entry under "Key Recent Changes".memory-bank/systemPatterns.md: new "Dual-Layer JSON Defence (Think Mode)" pattern documented.memory-bank/progress.md: new dated correction entry (2026-06-06) under "Corrections Récentes".
- Mocked pytest gate: 12/12 pass (
test_llm_client.py,test_llm_prompts.py,test_oil_bench_model.py). - Live
tests/check_llm_json.pyharness: 6/10 OK — all schema-strict cases pass with<|think|>active; failures are exclusively on looseformat:json(not used in production). - End-to-end
uv run main.py --t212: exit 0, 4.66 min total, 2 newtrading_journal.csvrows, 0"Could not find valid JSON"log lines. - All four
<|think|>-enabled LLM call sites produced validated JSON in real production conditions.
- Soft:
git revert <merge-commit>orgit switch main(pre-merge). - Hard:
git switch main && git branch -D think-mode(pre-merge, erases branch). - Targeted: remove
"<|think|> "prefix from the four"system"strings — schema layer keeps carrying safety.