A relational small-step / big-step semantics of the Ethereum Virtual
Machine in Lean 4, mirroring the structure of
NethermindEth/EVMYulLean
but expressed as Prop-valued inductive relations rather than
executable functions, so that reasoning is more direct.
Provenance. This package was mostly AI-generated (Claude) in collaboration with a human reviewer. Treat the design and proofs as a draft: the structure has been thought through, the build is green, the demo runs, and a substantial portion of the soundness lemmas are closed — but expect rough edges, especially in the deferred proof obligations. Not for production use.
| Phase | Description | State |
|---|---|---|
| 1 | Toolchain (Lean 4.31.0 + Mathlib v4.31.0) + foundation types | ✅ |
| 2 | Operation ADT (including EIP-8024 DUPN/SWAPN/EXCHANGE) and bytecode decoder |
✅ |
| 3 | Halted-state flag + ExecutionResult projection |
✅ |
| 4 | Small-step success rules (Step, 81 constructors) |
✅ |
| 5 | Small-step exception rules (Step, 9 generic constructors) |
✅ |
| 6 | Big-step relation (Eval) + reflexive-transitive closure Steps |
✅ |
| 7 | Executable shadow (stepF) + demo (Main.lean) |
✅ |
| 8 | Soundness stepF s = .ok s' → Step s s' |
✅ (no sorry) |
| 9 | Real Keccak-256 (EvmSemantics.Crypto.Keccak256, wired via @[implemented_by]) |
✅ |
| 10 | Plain CALL opcode (per-call-frame stack, EIP-150 forwarding, value stipend, static-mode guard, three callReturn* rules) |
✅ |
Demo (Main.lean) runs PUSH1 5 ; PUSH1 3 ; ADD ; STOP through the
executable shadow, producing stack [8] and halt = Success. Confirms
the relation/executable pair is at least internally consistent on a
trivial program.
- Multi-frame EVM: all arithmetic, comparison/bitwise, KECCAK256,
environmental reads, block-context reads, memory, storage (incl.
transient), stack manipulation (POP, PUSH0–PUSH32, DUP1–16, SWAP1–16),
control flow (JUMP, JUMPI, JUMPDEST, PC, GAS), halts (STOP, RETURN,
REVERT, INVALID), logging (LOG0–LOG4), EIP-8024 (DUPN, SWAPN, EXCHANGE),
and plain
CALLwith EIP-150 63/64 forwarding, value stipend, depth/balance pre-check, static-mode value-transfer rejection,returnDataclearing on pre-execution failure, and a list-backed call-frame stack with three resume rules (callReturnSuccess/callReturnRevert/callReturnException). - Excluded from v1:
CALLCODE/DELEGATECALL/STATICCALL,CREATE/CREATE2,SELFDESTRUCT, transaction processing (Υ), block validation, precompiled contracts, RLP encoding. - Gas: parameterised by EVM hard fork (
EvmSemantics.Fork, threaded throughExecutionEnv.fork).Gas.baseCost fork opreturns the static Yellow-Paper fee per fork (Constantinoplematches the legacy ethereum/tests corpus — Frontier-era SLOAD = 50, EXP per-byte = 10;Cancunuses the modern warm-priced reads and Spurious-Dragon EXP). All major dynamic costs are also modelled: memory expansion (chargeMem/chargeMem2, Yellow-Paper quadratic),Gas.sstoreCost(pre-EIP-1283 for Constantinople / EIP-2200 for Cancun, with the EIP-2200 stipend sentry viaGas.sstoreSentry),Gas.copyWordCost,Gas.keccakWordCost,Gas.logDataCost,Gas.expByteCost. The relationalStepRunning.outOfGasis generalised to accept acost : Natwitness withGas.baseCost ≤ cost, so dynamic-cost OOG (memory expansion, sstoreCost, per-word/byte/topic charges) is expressible. The only remaining unmodelled costs are the EIP-2929 cold/warm split forBALANCE/EXTCODESIZE/EXTCODECOPY/EXTCODEHASH(stubbed pending anaccessedAccountsset inSubstate) and the out-of-scope CALLCODE / DELEGATECALL / STATICCALL / CREATE / SELFDESTRUCT family. (PlainCALLis modelled — its base fee, memory expansion, value/new-account surcharge viaGas.callSurcharge, and 63/64 forwarding viaGas.allButOneSixtyFourthare all charged.) Schedule changes need to stay in lockstep acrossStep,stepF, the soundness proof, andVMRunner.gasComparableOpcode. - World state: modelled as plain functions, not hash maps —
Storage = UInt256 → UInt256,AccountMap = AccountAddress → Account, address sets asα → Prop. This trades enumerability for clean algebraic reasoning (Function.update, extensionality,simp). - Address space:
AccountAddress = Fin (2^160)— the real 20-byte EVM address space.
EvmSemantics.lean -- root re-exports
Main.lean -- demo executable
EvmSemantics/
Data/
UInt256.lean -- 256-bit words, modular arithmetic
-- (the operand stack is plain `List UInt256`)
State/
Account.lean -- AccountAddress, Storage, Account, AccountMap
BlockHeader.lean -- block-context fields read by BLOCK ops
ExecutionEnv.lean -- per-frame execution environment I
Substate.lean -- accrued substate A (logs, accessed sets, refunds)
Machine/
MachineState.lean -- machine state μ (gas, memory, returnData)
SharedState.lean -- world+machine bundle
EVM/
Operation.lean -- 14-constructor Operation ADT, + EIP-8024
Decode.lean -- byte → Operation + immediate decoder
Gas.lean -- gas cost (real base fees; dynamic parts stubbed)
Exception.lean -- 8-variant ExecutionException
State.lean -- EVM.State (pc, stack, halt, ...)
Halted.lean -- ExecutionResult + State.toResult
Step.lean -- Step wrapper + StepRunning/StepReturn rules
BigStep.lean -- reflexive-transitive Steps, big-step Eval
StepF.lean -- executable shadow, split by Operation group
Equiv.lean -- soundness lemmas (helper + headline)
lake build # compile library + executable
.lake/build/bin/evm_semanticsA lake exe cache get is recommended after the first lake update to
fetch Mathlib's precompiled .olean artifacts. The cold build is
~10 minutes; cached, ~30 seconds.
lakefile.toml registers Batteries' runLinter script as the project's
lint driver — the same one Mathlib uses for its own CI gate. Run it with:
lake lintIt runs the Batteries lint suite (missing doc-strings, simpNF, unused
arguments, dangerous instances, etc.) on every declaration under the
EvmSemantics namespace.
There is intentionally no scripts/nolints.json allow-list file —
all findings are addressed in source: short doc-strings everywhere, and
@[nolint unusedArguments] / attribute [nolint ...] annotations on
the handful of intentional exceptions (Gas.sstoreCost's ignored _original,
State.consumeGas's proof-witness _h, the auto-derived Repr.repr
declarations from deriving Repr, the trivial Keccak.injEq from a
single-constructor deriving DecidableEq, and the inner-loop helpers
generated by let rec).
CI (.github/workflows/ci.yml) runs both lake build (gated to fail
on any warning) and lake lint on every push and PR.
Step : EVM.State → EVM.State → Prop(small-step). A thin wrapper with two constructors —running(guards aStepRunningderivation withs.halt = .Running) andreturning(wraps aStepReturn). The per-opcode logic lives in:StepRunning— 90 constructors (81 success, one per opcode, + 9 generic exception constructors parametric over the operation). Noh_runningpremise on any of them; the guard is consumed once on theStep.runningwrapper.StepReturn— 3callReturn*constructors for popping the caller frame when a child halts. Each pins the concrete halt kind and the non-empty call stack.
Eval : EVM.State → ExecutionResult → Prop(big-step). Defined as the reflexive-transitive closure ofStepending in a halted state, projected viaState.toResultto a flatsuccess | returned _ | reverted _ | exception _sum.stepF : State → Except ExecutionException State(executable shadow). MirrorsStepopcode-by-opcode. Split into per-group helpers (stepF.stopArith,stepF.compBit, …) so each piece is small and individually reasoned about.
Most success constructors of StepRunning follow this anatomy (stop carries
only h_op — though RETURN/REVERT keep h_gas/h_stack/h_mem — and
stackless reads omit h_stack):
| add (s : State) (a b : UInt256) (rest : List UInt256)
(h_op : s.decodedOp = some .ADD)
(h_gas : Gas.baseCost s.fork .ADD ≤ s.gasAvailable)
(h_stack : s.stack = a :: b :: rest)
: StepRunning s
((s.consumeGas (Gas.baseCost s.fork .ADD) h_gas).replaceStackAndIncrPC
((a + b) :: rest))(gasAvailable is a Nat, so the gas premise is a plain Nat ≤; the operand
stack is List UInt256. s.decodedOp is the op-only projection of s.decoded
— pushN is the one rule that uses the full s.decoded, since it consumes the
PUSH immediate.)
consumeGas takes the gas-sufficiency proof as an explicit argument so
the saturating Nat subtraction is provably safe — no truncation
case-splits in downstream proofs.
The EVM.State carries a halt : HaltKind field. The Step.running
wrapper carries s.halt = .Running as its precondition, and each
StepReturn constructor pins a concrete non-Running halt kind via
h_halt, so a done state (halted with empty call stack) has no
successors under Step (proven uniformly via Step.not_from_done).
This keeps Step as a plain binary relation while still letting Eval
emit a structured result.
EVM/Equiv.lean establishes stepF_sound : stepF s = .ok s' → Step s s'
without any sorry. The proof is layered:
- Headline theorem
stepF_sound— unfoldsstepF, splits on halt/decode/gas, then dispatches to the 14 per-helper soundness lemmas based on the top-levelOperationconstructor. - Per-helper soundness lemmas — all 14 closed:
stopArith_sound,compBit_sound,keccak_sound,env_sound,block_sound,system_sound,stackMemFlow_sound,push_sound,log_sound,dup_sound,swap_sound,dupN_sound,swapN_sound,exchange_sound. - Supporting lemma
popN_correct(inStepF.lean) — by induction onk, shows that ifpopN stk k = some (topics, rest)thentopics.length = kandstk = topics ++ rest. Used bylog_soundto recover the list-of-topics witness needed byStepRunning.log.
A small design tweak was needed to make the proof go through:
StepRunning.pushN now takes the immediate-width as an explicit parameter
(immWidth : Nat) rather than tying it to k.val, sidestepping a
decoder invariant that would otherwise need a separate lemma.
- The opcode list, state-record layout, and per-instruction semantics follow NethermindEth/EVMYulLean closely. Anything ported verbatim should be attributed to that project.
- The Yellow Paper section numbers cited in comments correspond to the Cancun-era Ethereum spec.
Apache2, as specified in LICENSE-APACHE2.