Skip to content

cboone/zhang-yeung-inequality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

285 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Zhang-Yeung Inequality

A Lean 4 formalization of the Zhang-Yeung conditional information inequality, a non-Shannon-type bound on Shannon entropy that cannot be derived from the Shannon inequalities alone.

The central object is the Zhang-Yeung delta,

Δ(Z, U | X, Y) := I(Z ; U) - I(Z ; U | X) - I(Z ; U | Y),

and the main theorem (Theorem 3 of Zhang and Yeung 1998) is the upper bound

Δ(Z, U | X, Y) ≤ (1/2) · I(X ; Y) + (1/4) · (I(X ; Z,U) + I(Y ; Z,U)).

The proof rests on the copy lemma, which identifies the inequality as genuinely non-Shannon: no amount of Shannon-type reasoning can replace the copy construction.

Status

Milestone Title State
M0 Project scaffolding done
M1 Delta equational lemmas done
M1.5 Theorem 2 (standalone conditional result) done
M2 The copy lemma done
M3 Theorem 3 (main inequality) done
M4 Theorem 4 (Shannon-cone separation) done
M5 Theorem 5 (n+2-variable generalization) done
M6 Polish and release in progress

M1.5 is complete: the public ZhangYeung.theorem2 is fully proved with no sorry. The proof factors into (a) a Shannon-algebra reduction of the target inequality to Δ(Z, U | X, Y) ≤ 0 under the hypotheses I[X:Y] = I[X:Y|Z] = 0, and (b) a non-Shannon closing argument via the [@zhangyeung1997] auxiliary / PMF construction and Real.sum_mul_log_div_leq. Kaced and Romashchenko [@kaced2013] classify this inequality as essentially conditional -- it fails on the closure of the entropic region, so no Lagrange combination of Shannon-type inequalities and the premises suffices, and the KL route is effectively the only known proof. To our knowledge this is the first formalization of any non-Shannon-type information inequality in any proof assistant.

M2 is complete: the copy lemma of [@zhangyeung1998, §III eqs. 44-45] is formalized as the existential copyLemma, which wraps PFR's ProbabilityTheory.condIndep_copies on the pair ⟨X, Y⟩ conditioned on the shared variable ⟨Z, U⟩ and produces an extended law ν with a second conditionally-independent copy (X₁, Y₁) of (X, Y) over (Z, U). Lemma 2 of the same paper (eq. 45) ships both as the abstract Shannon identity delta_of_condMI_vanishes_eq, which holds whenever a single conditional mutual information vanishes, and as six Zhang-Yeung-flavored corollaries specialized to the copy's projections: copyLemma_delta_transport_Y_to_Y₁ and copyLemma_delta_transport_X_to_X₁ transport Δ between μ and ν; copyLemma_delta_identity_Y₁ and copyLemma_delta_identity_X_X₁ instantiate Lemma 2 at the copy; and copyLemma_delta_le_mutualInfo_Y₁ and copyLemma_delta_le_mutualInfo_X_X₁ combine the identity with nonnegativity of the three conditional-mutual-information terms on its right-hand side to yield the two Δ ≤ I(·;·) inequalities that feed the Shannon chase for Theorem 3.

M3 is complete: Theorem 3 of [@zhangyeung1998, §III eqs. 21-23], the headline Zhang-Yeung inequality and the first known non-Shannon-type linear inequality on Shannon entropies of four random variables, is formalized in all three of the paper's equivalent forms. The public theorems zhangYeung (eq. 21, asymmetric), zhangYeung_dual (eq. 22, the X ↔ Y swap), and zhangYeung_averaged (eq. 23, the averaged symmetric form displayed in this README's header) sit on top of a single private zhangYeung_integer that closes the paper's Shannon chase at the integer-scaled shape 2·I[Z:U] - 3·I[Z:U|X] - I[Z:U|Y] ≤ I[X:Y] + I[X:⟨Z,U⟩]; the three public wrappers route this through the M1 form-conversion lemmas delta_form21_iff, delta_form22_iff, delta_form23_iff, and delta_form23_of_form21_form22. The chase itself takes the two M2 copy-lemma inputs (copyLemma_delta_le_mutualInfo_Y₁ and copyLemma_delta_le_mutualInfo_X_X₁), adds a generic three-way mutual-information identity, invokes data processing for PFR's CondIndepFun on the copy (via the ZhangYeung.condIndepFun_comp helper promoted to ZhangYeung/Prelude.lean as part of M3), and transports the result back to μ through IdentDistrib.mutualInfo_eq.

M4 is complete: Theorem 4 of [@zhangyeung1998, §II eq. 26], the scientific payoff of the Zhang-Yeung inequality, is formalized in its exact closure form. The public theorem4 proves ∃ F ∈ Γ_4, F ∉ closure(Γ_4^*), and theorem4_ge_four lifts this to all n ≥ 4. The proof combines (a) a -valued counterexample witness F_witness_ℚ on Finset (Fin 4) (the paper's explicit set function on lines 368-377, specialized to a = 1) whose Shannon-cone proof now uses decide for the empty / monotonicity branches and a structural submodularity proof for the exceptional pair behavior; (b) the specific Zhang-Yeung violation at the canonical labeling (2, 3, 0, 1) via the permutation Equiv.swap 0 2 * Equiv.swap 1 3; (c) a set-function bridge zhangYeungHolds_of_entropy that restates M3's zhangYeung at the set-function level over a heterogeneous family X : ∀ i : Fin 4, Ω → S i; and (d) a generic entropic-region layer (entropyFn_n, entropyRegion_n, almostEntropicRegion_n, restrictFirstFour) that packages the witness argument as the literal separation \bar{Γ}_n^* \neq Γ_n. The finite witness theorem is retained as theorem4_finite, and the stronger cone-level separation is retained as shannon_incomplete_ge_four.

M5 is complete: Theorem 5 of [@zhangyeung1998, §III eqs. 27-28], the n + 2-variable Zhang-Yeung generalization, is formalized as the public theorems theorem5 and theorem5_averaged. The proof lifts the M3 copy-lemma chase to a tuple-valued copy of the full Fin n-indexed side-variable family, sums the pairwise Lemma 2 inequalities, controls the resulting mutual-information sum by a private mutualInfo_add_n_way_inequality, and closes by data processing plus tuple-level marginal transports. As part of that landing, ZhangYeung/Prelude.lean now exposes the generic helpers IdentDistrib.condMutualInfo_eq, mutualInfo_add_three_way_identity, and mutualInfo_le_of_condIndepFun alongside condIndepFun_comp.

The living roadmap is in docs/plans/todo/2026-04-15-zhang-yeung-formalization-roadmap.md.

Formalization Scope

  • Finite-alphabet, real-valued probability distributions.
  • Built on top of PFR's entropy API (H[X], I[X:Y], I[X:Y|Z]) from teorth/pfr, pinned in lakefile.toml.
  • Measurability and finite-range side conditions expressed via Mathlib's [MeasurableSpace], [Fintype], and [MeasurableSingletonClass] typeclasses.
  • Deferred: infinite alphabets, computable entropy, upstream push of the copy lemma into PFR or Mathlib.

Module Layout

  • ZhangYeung.lean — project entrypoint; re-exports ZhangYeung.CopyLemma, ZhangYeung.Delta, ZhangYeung.EntropyRegion, ZhangYeung.Prelude, ZhangYeung.Theorem2, ZhangYeung.Theorem3, ZhangYeung.Theorem4, and ZhangYeung.Theorem5.
  • ZhangYeung/Prelude.lean — import surface for PFR's entropy API, plus the generic helpers ZhangYeung.condIndepFun_comp, IdentDistrib.condMutualInfo_eq, ZhangYeung.mutualInfo_add_three_way_identity, and ZhangYeung.mutualInfo_le_of_condIndepFun.
  • ZhangYeung/Delta.lean — M1 delta quantity and equational lemmas (delta_def, delta_comm_cond, delta_comm_main, delta_self, delta_eq_entropy, delta_form21_iff, delta_form22_iff, delta_form23_iff, delta_form23_of_form21_form22, delta_le_mutualInfo).
  • ZhangYeung/EntropyRegion.lean — generic entropic-region infrastructure for Theorem 4: entropyFn_n, the set-level regions shannonRegion_n / entropyRegion_n / almostEntropicRegion_n, and the first-four restriction map restrictFirstFour with its entropic and almost-entropic transport lemmas.
  • ZhangYeung/Theorem2.lean — M1.5 Zhang-Yeung conditional information inequality (theorem2), a standalone formalization of the first known non-Shannon-type conditional information inequality; factored into a Shannon-algebra reduction (theorem2_shannon_identity) and the KL-divergence core (theorem2_delta_le_zero) with supporting lemmas ptilde_sum_eq_one, phat_sum_eq_one, delta_eq_sum_log_ratio, sum_joint_eq_sum_ptilde.
  • ZhangYeung/CopyLemma.lean — M2 Zhang-Yeung copy lemma (§III eqs. 44-45 of the 1998 paper); ships the copyLemma existential (a thin wrapper over PFR's ProbabilityTheory.condIndep_copies), the abstract Lemma 2 delta_of_condMI_vanishes_eq, and six copy-projection corollaries (copyLemma_delta_transport_Y_to_Y₁, copyLemma_delta_transport_X_to_X₁, copyLemma_delta_identity_Y₁, copyLemma_delta_identity_X_X₁, copyLemma_delta_le_mutualInfo_Y₁, copyLemma_delta_le_mutualInfo_X_X₁) covering the two delta transports, two delta identities, and two Δ ≤ I(·;·) inequalities feeding the Shannon chase for Theorem 3.
  • ZhangYeung/Theorem3.lean — M3 Zhang-Yeung inequality (Theorem 3 of the 1998 paper, §III eqs. 21-23): zhangYeung (eq. 21), zhangYeung_dual (eq. 22), and zhangYeung_averaged (eq. 23), all derived from a private zhangYeung_integer that closes the paper's Shannon chase on the copy joint and routes through the M1 form-conversion lemmas.
  • ZhangYeung/Theorem4.lean — M4 Shannon incompleteness at n = 4 and n ≥ 4 (Theorem 4 of the 1998 paper, §II eq. 26): the set-function calculus (I_F, condI_F, delta_F), the cone predicates (shannonCone, zhangYeungAt, zhangYeungHolds), the rational-valued counterexample witness (F_witness_ℚ, F_witness), the finite auxiliary theorem4_finite, the exact closure theorems theorem4 and theorem4_ge_four, the sequence-level auxiliary theorem4_seqClosure, and the stronger cone-level corollary shannon_incomplete_ge_four.
  • ZhangYeung/Theorem5.lean — M5 Theorem 5, the n + 2-variable Zhang-Yeung generalization (1998, §III eqs. 27-28): the tuple-copy projection helpers, the private mutualInfo_add_n_way_inequality chain-rule bound, and the public theorems theorem5 and theorem5_averaged.
  • ZhangYeungTest.lean + ZhangYeungTest/Delta.lean + ZhangYeungTest/EntropyRegion.lean + ZhangYeungTest/Theorem2.lean + ZhangYeungTest/CopyLemma.lean + ZhangYeungTest/Theorem3.lean + ZhangYeungTest/Theorem4.lean + ZhangYeungTest/Theorem5.lean — compile-time API regression tests for each module.

Build and Verify

Fresh clone or worktree:

bin/bootstrap-worktree      # mandatory: downloads Mathlib + PFR prebuilt artifacts, then builds

Day-to-day commands:

make build                  # lake build ZhangYeung
make test                   # lake test (ZhangYeungTest example suite)
make lean-lint              # lake lint (batteries/runLinter)
make lint                   # markdownlint-cli2 + cspell
make check                  # lint + lean-lint + build + test
make help                   # show all targets

See AGENTS.md for the full command reference and CONTRIBUTING.md for the development workflow.

Dependencies

  • Lean 4 toolchain, pinned in lean-toolchain.
  • Lake (bundled with Lean).
  • PFR, pinned by Git revision in lakefile.toml. PFR supplies the entropy API used throughout.
  • Mathlib (via PFR), downloaded as prebuilt artifacts by the bootstrap script. Never built from source in CI or in a fresh worktree.

References

Primary sources and transcriptions live in references/. The paper being formalized and the principal background text:

Other closely related texts:

  • Zhang and Yeung (1997). A non-Shannon-type conditional inequality of information quantities. IEEE Transactions on Information Theory 43(6). (references/papers/zhangyeung1997.pdf.) This is the primary reference for the M1.5 Theorem 2 proof via Kullback-Leibler divergence.
  • Yeung (2008). Information Theory and Network Coding. Springer. DOI: 10.1007/978-0-387-79234-7. (Cross-reference for Theorem 3 and the copy lemma.)

Additional entries are listed in references/README.md.

AI Statement

This formalization is being completed with substantial assistance from Opus 4.6 + 4.7 and GPT 5.4, through claude and opencode, and GitHub Copilot.

License

Copyright 2026 Christopher Boone.

This repository carries REUSE-style mixed-license coverage:

  • Lean code, the bin/bootstrap-worktree shell script, and hand-authored infrastructure config (Makefile, lakefile.toml, YAML, JSONC, dotfiles, cspell-words.txt, VS Code editor config) under Apache 2.0. Lean files and bin/bootstrap-worktree carry inline SPDX headers; the remaining infrastructure/config files are covered by the REUSE.toml config-group annotation.
  • Project-authored prose (READMEs, agent configs, planning documents, research notes, formalization reviews, this README, the NOTICE) under CC BY 4.0 via the REUSE.toml prose-group annotation.
  • CODE_OF_CONDUCT.md, adapted from the Contributor Covenant via the Organization for Ethical Source, under CC BY-SA 4.0 via a dedicated REUSE.toml override annotation.
  • Generated artifacts (lake-manifest.json, lean-toolchain) dedicated to the public domain under CC0 1.0 via the REUSE.toml generated-group annotation.
  • Bundled third-party reference material under references/papers/** and references/transcriptions/** under LicenseRef-Reference-Material. Copyright remains with the original authors and publishers; per-publication attribution (IEEE, original authors) is preserved via per-file .license sidecars rather than a REUSE.toml glob.

See NOTICE and the combination of per-file SPDX headers plus root REUSE.toml annotations for the authoritative coverage. See https://reuse.software/ for tooling.

Contributors

Languages