How the EvmSemantics codebase is organized and what happens where. For the
operational quick-reference (commands, conventions, opcode-change checklist) see
AGENTS.md; for the conformance harness see VMTESTS.md. This document explains
the structure — the layers, the data flow of an execution step, and the
three-views-plus-soundness design at the heart of the project.
The project defines EVM execution three times and proves the executable definition agrees with the relational one:
Step— aProp-valued small-step relation (the specification, the source of truth).Eval— a big-step relation built as the reflexive-transitive closure ofStep, projected to a flat result.stepF— anExcept-valued executable function shadowingStepopcode-by-opcode (the thing the demo and the test harness actually run).
EVM/Equiv.lean closes stepF_sound : stepF s = .ok s' → Step s s' with no
sorry, so any successful stepF step is backed by a derivation in the
relational spec. This covers only the .ok s' path — stepF's .error
(exception) results are not in general matched by a Step successor; the
memory-expansion OutOfGas case (chargeMem → .error, with no Step-side
memory-OOG rule) is a current example.
Modules form a clean dependency stack — foundation → state → machine → EVM core
→ semantics → executables. Everything is re-exported through the root
EvmSemantics.lean.
graph TD
subgraph Foundation
UInt256["Data/UInt256.lean<br/>256-bit modular words,<br/>arith · bitwise · expFast"]
end
subgraph "State (world + frame pieces)"
Account["State/Account.lean<br/>AccountAddress=Fin 2^160,<br/>Storage · Account · AccountMap<br/>(function-backed maps)"]
BlockHeader["State/BlockHeader.lean<br/>BlockHeader (block context)"]
ExecEnv["State/ExecutionEnv.lean<br/>ExecutionEnv I<br/>(code · calldata · origin · header)"]
Substate["State/Substate.lean<br/>Substate A<br/>(logs · accessed sets · refunds)"]
end
subgraph Machine
MachineState["Machine/MachineState.lean<br/>MachineState μ<br/>(gas · memory · returnData),<br/>mload/mstore/mcopy · memCost"]
SharedState["Machine/SharedState.lean<br/>SharedState = μ + world<br/>(accountMap · substate · env)"]
end
subgraph "EVM core"
Operation["EVM/Operation.lean<br/>Operation ADT (grouped)<br/>+ match_pattern abbrevs"]
Exception["EVM/Exception.lean<br/>ExecutionException"]
State["EVM/State.lean<br/>EVM.State = SharedState<br/>+ pc · stack · halt"]
Decode["EVM/Decode.lean<br/>byte → Operation + immediate<br/>(end-of-code ⇒ STOP)"]
Gas["EVM/Gas.lean<br/>Gas.baseCost : Fork → Operation → Nat"]
Halted["EVM/Halted.lean<br/>ExecutionResult<br/>+ State.toResult"]
end
subgraph "Semantics (three views + proof)"
Step["EVM/Step.lean<br/>Step wrapper (small-step Prop)<br/>StepRunning opcode rules · StepReturn resume rules"]
BigStep["EVM/BigStep.lean<br/>Steps (rtc) · Eval (big-step)"]
StepF["EVM/StepF.lean<br/>stepF executable shadow<br/>+ 14 per-group helpers"]
Precompile["EVM/Precompile.lean<br/>YP §9 dispatch (0x04 identity)"]
Equiv["EVM/Equiv.lean<br/>stepF_sound (no sorry)<br/>+ 14 helper lemmas"]
end
subgraph Executables
Main["Main.lean<br/>demo: run = fuel loop over stepF"]
VMRunner["tests/VMRunner.lean (exe vmtests)<br/>JSON harness + fuel loop"]
end
UInt256 --> Account & BlockHeader & ExecEnv & Substate & MachineState
Account --> BlockHeader & ExecEnv & Substate & SharedState
BlockHeader --> ExecEnv
ExecEnv --> SharedState
Substate --> SharedState
MachineState --> SharedState
SharedState --> State
UInt256 --> State
Exception --> State
Operation --> Decode & Gas
UInt256 --> Decode
State --> Step
Decode --> Step
Gas --> Step
State --> Halted
Step --> BigStep & StepF
Halted --> BigStep
Step --> Equiv
StepF --> Equiv
BigStep --> Equiv
Precompile --> Step & StepF
Step -.->|re-export| Main
StepF --> Main & VMRunner
Foundation
Data/UInt256.lean— the 256-bit wordUInt256(a wrapper overFin (2^256)) with modularadd/sub/mul/div/mod/addMod/mulMod, bitwiseland/lor/xor/lnot/shiftLeft/shiftRight, and two exponentiations:exp(the cleana^b % 2^256spec, used byStep) andexpFast(modular fast-exponentiation, used bystepF), withexpFast_eq_expproving them equal. There is no separateStackmodule — the operand stack is justList UInt256.
State (world + per-frame pieces) — all maps are functions, not hash maps,
trading enumerability for clean algebraic reasoning (Function.update, simp):
Account.lean—AccountAddress = Fin (2^160),Storage = UInt256 → UInt256, theAccountrecord, andAccountMap = AccountAddress → Account, each withget/setand@[simp]get-set lemmas.BlockHeader.lean— the block-context fields BLOCK opcodes read.ExecutionEnv.lean— the immutable per-frame environmentI(code, calldata, origin/caller/address, value, gas price, header, depth,permitStateMutationfor static-call detection).Substate.lean— the accrued substateA:LogSeries, accessed-account and accessed-storage-key sets (AddressSet/StorageKeySet, also functions toProp), and refund counter.
Machine
MachineState.lean— the machine stateμ:gasAvailable,memory(aByteArray),activeWords,returnData,hReturn. Owns memory access (mload/mstore/mstore8/mcopy,readPaddedwith zero-padding) and the Yellow-Paper quadratic memory-cost machinery (memCost,memExpansionDelta).SharedState.lean— bundlesMachineStatewith the world (accountMap,substate,executionEnv) viaextends.
EVM core
Operation.lean— theOperationADT, deliberately grouped into sub-inductives (StopArithOps,CompareBitwiseOps,KeccakOps,EnvOps,BlockOps,StackMemFlowOps,SystemOps) and structs (PushOp,DupOp,SwapOp,DupNOp,SwapNOp,ExchangeOp,LogOp).@[match_pattern]abbrevs (STOP,ADD, …) let the rest of the code name flat opcodes while the grouping letsstepF/Step/Equivdispatch per-group. EIP-8024DUPN/SWAPN/EXCHANGElive here.Exception.lean—ExecutionException(stack underflow/overflow, OOG, invalid jump, static-mode violation, …).State.lean—EVM.StateextendsSharedStatewithpc,stack : List UInt256,execLength,halt : HaltKind(Running | Success | Returned | Reverted | Exception e), and the call-frame stack. The spec-side rules use a flat{ s with ... }record-update post-state (every field the opcode touches is named directly), soStateitself carries noconsumeGas/replaceStackAndIncrPChelpers — those live inStepF.lean, where the executable shadow charges gas in chained form.Decode.lean—opcodeOf : UInt8 → Option OperationanddecodeAt : ByteArray → pc → Option (Operation × Option (UInt256 × Nat))returning the operation plus any PUSH immediate (value + width). Reading past code end decodes asSTOP(Yellow-Paper zero-padding). AlsoisValidJumpDest : ByteArray → Nat → Bool— the push-data-aware jumpdest analysis used by JUMP/JUMPI: scans the code from pc 0, skipping each opcode's immediate bytes, and acceptstargetonly when it is reached as an instruction boundary andcode[target] = 0x5b.Gas.lean—Gas.baseCost : Fork → Operation → Nat, the static per-opcode fee parameterised by the hard fork (Constantinople/Cancun). Alongside, several dynamic cost helpers (also fork-aware where it matters):Gas.sstoreCost fork original current new— pre-EIP-1283 (Constantinople) or EIP-2200 net-metered (Cancun);Gas.copyWordCost size—3 · ⌈size/32⌉for all five copy opcodes;Gas.keccakWordCost size—6 · ⌈size/32⌉for KECCAK256;Gas.logDataCost size—8 · sizefor LOG;Gas.expByteCost fork exponent— Frontier (10) or post-Spurious-Dragon (50) per exponent byte;Gas.sstoreSentry fork gas— the EIP-2200 stipend (≤ 2300 → OOG) for Cancun only;Gas.callSurcharge valueNZ calleeEmpty— the CALL value/new-account surcharge (9000if value ≠ 0,+25000if calling an empty account); CALLCODE passescalleeEmpty = falsesince it never creates an account (the code is borrowed; the storage stays with the caller);Gas.allButOneSixtyFourth gas— EIP-150 forwarding cap (gas - gas/64). Memory expansion gas is charged separately bystepF.chargeMem/chargeMem2. The only remaining dynamic costs we don't yet model are the EIP-2929 cold/warm split onBALANCE/EXTCODESIZE/EXTCODECOPY/EXTCODEHASH(these are stubbed at1/100with aTODOcomment pending anaccessedAccountsset inSubstate) and the out-of-scope cold/warm split family and the CALL-family dynamic surcharge. The legacy ethereum/tests corpus uses Frontier rules for SELFDESTRUCT (cost 0, noG_newaccountsurcharge) — same convention as our Frontier-rate SLOAD = 50 and EXP per-byte = 10 — so on theConstantinopleforkGas.baseCost .SELFDESTRUCT = 0andGas.selfDestructSurcharge .Constantinople _ _ = 0. Modern values (5000 + 25000) live onCancun. Every VMTests test runs with its declaredexec.gasbudget; the remaining-gasvalue is compared against the corpus whenever apostblock is present. SeeVMTESTS.mdfor the breakdown.
Halted.lean—ExecutionResultandState.toResult, projecting a haltedStateinto the flat success/returned/reverted/exception sum.Fork.lean—inductive Fork = Constantinople | Cancun; the active fork is carried onExecutionEnv.forkand threaded throughGas.baseCost/Gas.sstoreCost/Gas.expByteCost.
Crypto (EvmSemantics/Crypto/)
Keccak256.lean— a self-contained implementation of the original Keccak hash function (the variant Ethereum uses, delimiter0x01, not NIST SHA-3's0x06): the Keccak-f[1600] permutation (state = 25 × 64-bit lanes, 24 rounds of θ-ρ-π-χ-ι), the sponge driver specialised to Keccak-256 (rate 1088 bits, capacity 512 bits, 256-bit output), and akeccak256Impl : ByteArray → UInt256that packs the 32-byte digest big-endian. The file then declaresopaque keccak256 : ByteArray → UInt256wired tokeccak256Implvia@[implemented_by]. The relationalSteprules see only the opaque signature, so soundness is independent of the hash; the executablestepFruns the real thing.
Semantics — see the next two sections.
Executables
Main.lean—initState+ apartial def runfuel loop overstepF; the demo runsPUSH1 5; PUSH1 3; ADD; STOP.tests/VMRunner.lean(exevmtests) — the conformance harness; see below.tests/KeccakTest.lean(exekeccak_test) — differential check that ourKeccak256agrees with well-known Ethereum hash vectors (empty input,"abc", the ERC-20transfer(address,uint256)selector).
stepF (and the relation Step it shadows) turn one running State into the
next; the surrounding run loop owns the halt guard, since stepF itself just
errors on an already-halted state. The flow of one run iteration:
flowchart TD
Start([run iteration: State s]) --> Halt{s.halt = Running?}
Halt -->|no| Done([run returns s; loop ends])
Halt -->|yes| Dec["stepF s → decodeAt code pc<br/>(Decode.lean)"]
Dec -->|"none: unassigned byte"| Invalid["error InvalidInstruction"]
Dec -->|"some (op, imm)<br/>(past code end ⇒ STOP)"| GasChk{"Gas.baseCost fork op<br/>≤ gasAvailable?"}
GasChk -->|no| OOG["error OutOfGas"]
GasChk -->|yes| Consume["consumeGas<br/>(proof-carrying)"]
Consume --> Dispatch{"dispatch on<br/>Operation group"}
Dispatch --> H1["stopArith"]
Dispatch --> H2["compBit"]
Dispatch --> H3["env / block"]
Dispatch --> H4["stackMemFlow<br/>(mem · storage · jumps)"]
Dispatch --> H5["push/dup/swap/<br/>dupN/swapN/exchange"]
Dispatch --> H6["log · keccak · system"]
H1 & H2 & H3 & H4 & H5 & H6 --> Outcome{stack/mem ok?}
Outcome -->|underflow / bad jump / etc.| Err["error ExecutionException"]
Outcome -->|ok| Next["ok s' :<br/>new stack/pc/mem/halt"]
The dispatch arms map one-to-one to the stepF.* helpers in StepF.lean
(stopArith, compBit, keccak, env, block, stackMemFlow, push,
dup, swap, dupN, swapN, exchange, log, system) — and one-to-one to
the soundness lemmas in Equiv.lean. The halting opcodes are ordinary ok s'
outcomes that set s'.halt (STOP ⇒ Success in stopArith; RETURN/REVERT ⇒
Returned/Reverted in system) — including the implicit STOP from running
off the end of the code. Neither stepF nor Step loops on its own; iteration
to a halt is the run fuel loop in each executable, which is also what skips
already-halted states — stepF called directly on a non-Running state just
returns .error .InvalidInstruction.
All four inter-contract opcodes — CALL, CALLCODE, DELEGATECALL, and
STATICCALL — are implemented. They live in stepF.system's matching
arms with StepRunning.call / callFail / callStatic, callcode /
callcodeFail, delegatecall / delegatecallFail, and staticcall /
staticcallFail constructors, all wrapped by Step.running in the
combined relation. The four kinds share a common skeleton (see the
CallKind enum + State.calleeEnvFor / State.enterCallFor helpers in
State.lean); per-kind differences are isolated to the calleeCodeOwner
/ calleeSource / calleeWeiValue / calleePermit / transfersValue
projections. Call-frame state is kept on a
per-State callStack : List Frame (defined in State.lean): each Frame
snapshots the caller's pc, stack, gasAvailable, activeWords, memory,
returnData, executionEnv, the retOffset/retSize window the caller
asked for, and the world snapshot (snapAccountMap, snapSubstate) used to
roll back on revert / exception.
The CALL arm fires in this order:
- Static-mode check — if
¬ permitStateMutation ∧ value ≠ 0, halt with.StaticModeViolation(mirrored byStepRunning.callStatic). Zero-value CALLs remain permitted in static frames. - Memory expansion —
chargeMem2for the union of the args range and the return range. - Surcharge —
Gas.callSurcharge(9000 if value ≠ 0; +25000 if calling an empty account). - Depth/balance pre-check — if
depth ≥ 1024 ∨ caller.balance < value, the call is not taken: push0, clearreturnData, advance PC, keep the unspent forwarded gas. (StepRunning.callFail.) - 63/64 forwarding —
Gas.allButOneSixtyFourthcaps the gas the callee receives; the value stipend is added for non-zero-value calls. - Enter callee —
State.enterCallsnapshots the caller frame ontocallStack, transfersvalue, installs the callee env, and clearsmemory/returnData/hReturn.
The CALLCODE arm runs the same six-step pipeline with three differences:
(1) no static-mode check — the "transfer" is caller→caller and the
opcode is therefore not a state mutation at this site (state-mutating
opcodes inside the callee are still rejected because permitStateMutation
propagates); (2) surcharge passes targetEmpty = false — CALLCODE
never creates a new account; (3) enter callee passes the caller's own
address as the call target, so enterCall's self-transfer is a balance
no-op and the callee's address stays the caller, while calleeCode is
read from the target account so the borrowed code is what executes.
DELEGATECALL and STATICCALL both pop six stack items (no value),
skip the surcharge entirely (Gas.callSurcharge false false = 0), perform
no balance / value transfer, and have no *Static constructor — they
cannot mutate value directly. They share the same pipeline (memory
expansion → depth-limit pre-check → 63/64 forwarding → enterCallFor):
DELEGATECALLsetsCallKind.DelegateCall: the callee inherits the caller'scaller(msg.sender) andweiValue(CALLVALUE), runs in the caller's storage context, executes the target's code.STATICCALLsetsCallKind.StaticCall: the callee runs in the target's context (address = target) but withpermitStateMutationforced tofalse, so any state-mutating opcode in the new frame is rejected.CALLVALUEis forced to0.
YP §9 reserves addresses 0x01..0x09 for precompiles: native operations
the EVM invokes in place of stored bytecode. The dispatcher lives in
EvmSemantics.EVM.Precompile.run : AccountAddress → ByteArray → Nat → Result and returns one of three outcomes — .success output gasUsed,
.outOfGas, or .notAPrecompile (target isn't in the modelled set, fall
through to the normal enterCall path). Currently implemented:
0x04 identity (runIdentity, gas 15 + 3·⌈|input|/32⌉). The other
eight arms (0x01 ecrecover, 0x02 sha256, 0x03 ripemd160, 0x05 modexp, 0x06–0x09 BN254/BLAKE2F) are stubbed with explicit
.notAPrecompile so a call into them STOPs cleanly; adding a new
precompile is a two-line edit (runFoo + dispatch arm).
Frame-entry layer. The dispatch lives in one place — a single
arm at the top of stepFE's running branch — rather than being
duplicated across the four CALL-family handlers. To make that single
arm correct for every entry path, each frame's ExecutionEnv carries a
codeAddr field: the borrowed-from address whose bytecode is
loaded into the frame.
CALL/STATICCALLsetcodeAddr = address = tgt.CALLCODE/DELEGATECALLsetcodeAddr = tgt, butaddressstays the caller's (those two preserve the caller's storage / identity context). WithoutcodeAddra CALLCODE/DELEGATECALL into a precompile would be invisible to a check that only looked ataddress.Tx.buildInitStatesetscodeAddr := tx.recipientfor the top-level frame, so a transaction to a precompile fires the dispatch on its very first step.- A CREATE/CREATE2 init-code frame sets
codeAddr := newAddr(the freshly-derived address; cannot collide with0x01..0x09in practice).
The dispatch itself. With codeAddr in place, stepFE's running
branch is just:
match Precompile.run s.executionEnv.codeAddr s.executionEnv.calldata
s.gasAvailable with
| .success out gasUsed =>
.ok { s with halt := .Returned, hReturn := out,
gasAvailable := s.gasAvailable - gasUsed }
| .outOfGas =>
.ok { s with halt := .Exception .OutOfGas, hReturn := empty,
gasAvailable := 0 }
| .notAPrecompile =>
/- fall through to the normal bytecode dispatch -/
On .success the frame is marked .Returned; the next iteration's
resumeByHalt → resumeSuccess copies hReturn into the caller's
return window and refunds unspent gas. On .outOfGas the frame is
marked .Exception .OutOfGas; resumeByHalt → resumeException rolls
the world back via the frame's snapshot (undoing the value transfer
the surrounding CALL applied) and pushes 0. On .notAPrecompile
(any address not in the implemented set, including non-precompiles)
the normal bytecode dispatch runs as before.
Spec side. The dispatch lives at the Step layer (not inside
StepRunning), as three top-level constructors:
Step.running— the bytecode arm. Carriess.halt = .Runningand an explicit gatePrecompile.isPrecompile s.executionEnv.fork s.executionEnv.codeAddr = falseso a precompile-frame state can never derive a bytecode transition.Step.precompileSuccess/Step.precompileOog— the precompile arms. Each takess.halt = .Runningplus a positivePrecompile.isPrecompile … = truewitness plus aPrecompile.run … h_isPrec = .success/.outOfGas …outcome witness. Note thatPrecompile.runitself takes theisPrecompileproof as a precondition — it has no.notAPrecompilearm, because the precondition rules that case out by construction.
The gate is what gives the relation exclusivity between the two
sides: without it Step would over-derive (e.g. a precompile address
has empty bytecode that decodes to STOP, so StepRunning.stop
would also fire alongside Step.precompileSuccess). With the gate,
exactly one of the three arms is derivable for any running state.
The Bool predicate Precompile.isPrecompile : Fork → AccountAddress → Bool is the single source of truth for "is this address a
precompile we model?" — used by Step.running, the precompile
arms, and stepFE's dispatch, so the three stay in lockstep by
referencing the same definition.
The four CALL-family StepRunning constructors (call, callcode,
delegatecall, staticcall) are unchanged: they push the frame the
usual way, and the next small step fires either the precompile arm
or the bytecode arm depending on the new frame's codeAddr. This is
the user-suggested factoring — an intermediate layer between the
caller's execution step and the callee's first step — and it
collapses what would have been eight op-specific precompile
constructors into two generic ones.
The proof of stepFE_sound adds a single split at h on
Precompile.run at the top of the running branch, dispatching the
three arms to the two precompile rules and (for .notAPrecompile)
to the existing per-operation logic — passing the
.notAPrecompile witness as Step.running's gate premise.
Tx.execute needs no precompile special-case at all — the
generic frame-entry rule handles tx-to-precompile transactions on
the first step of the stepF loop.
SELFDESTRUCT lives in stepF.system .SELFDESTRUCT with matching
StepRunning.selfDestruct / StepRunning.selfDestructStatic
constructors. It pops the beneficiary address, rejects under
permitStateMutation = false, charges base G_selfdestruct = 5000
plus the Gas.selfDestructSurcharge (25000 iff the beneficiary is
empty and self has non-zero balance), then calls
State.selfDestructTo. That helper credits the beneficiary with
self's balance and zeroes self's balance in credit-then-debit order
(not AccountMap.transfer's set-then-set, which would net-cancel a
self-beneficiary's update and leave the balance unchanged instead of
burning it). It also marks self in Substate.selfDestructSet and
adds R_selfdestruct = 24000 to the refund counter on Constantinople
(0 on Cancun pending EIP-6780). The frame halts with .Success. The
account is not deleted at this site — pre-Cancun semantics defer
deletion to end-of-transaction; for our single-tx test corpora the
post-state comparison only enumerates accounts listed in the test's
post, so leaving the self-destructed account's storage/code in place
is observably equivalent to immediate deletion.
CREATE and CREATE2 open an init-code sub-frame whose address
is the freshly-derived newAddr and whose code is the init bytes
read from caller memory. The frame is marked on the call stack by a
new field Frame.createAddr : Option AccountAddress (none for CALL
frames, some addr for CREATE frames), and State.resumeByHalt
routes the child halt through one of three new CREATE-specific resume
helpers — resumeCreateSuccess (deploy hReturn as addr's code,
charging G_codedeposit = 200 · |code| from the child's remaining gas;
if unaffordable, treat as exception and roll back),
resumeCreateRevert (rollback, push 0, keep hReturn as
returnData), and resumeCreateException (rollback, push 0, refund
nothing). The pushed value on success is addr.toUInt256, not the
CALL-family 1/0 flag.
Address derivation:
- CREATE:
keccak256(rlp([sender, sender.nonce]))[12:]. The RLP encoder lives inEvmSemantics/Data/Rlp.leanand only handles the[20-byte address, uint nonce]shape (short-string + short-list paths only; the>55 bytelength-of-length prefixes are not reachable here since the encoded list is ≤ 30 bytes). - CREATE2:
keccak256(0xff || sender(20) || salt(32) || keccak256(initcode)(32))[12:]. CREATE2 additionally paysGas.create2HashCost size = 6 · ⌈size/32⌉for the init-code keccak.
The pre-frame pipeline (memory expansion → depth/balance pre-check →
63/64 forwarding → State.enterCreate) bumps the caller's nonce by 1
before installing the child frame (the address derivation has
already used the pre-bump nonce); transfers value to the new
account; sets the new account's nonce to 1 (EIP-161 "exists" marker
so the account isn't isEmpty). On Frame.snapAccountMap rollback,
both the nonce bump and the value transfer are undone.
Address-collision detection uses a Bool-valued helper
Account.isContract (= code.size != 0 || nonce.toNat != 0,
stricter than Account.isEmpty because balance is excluded — per the
Yellow Paper a pre-funded address with no code and nonce = 0 is
still a valid creation target). The CREATE / CREATE2 arms dispatch
via match … .isContract with | true => … | false => …; on
true the caller's nonce is still bumped, 0 is pushed, and no
transfer or frame entry occurs (forwarded gas is also not spent). The
discriminator is Bool rather than Prop so the Equiv-proof's
split at h produces clean true/false cases instead of going
through a Decidable instance for ByteArray.size = 0 that the
elaborator normalises to ByteArray = ByteArray.empty — that
normalisation tangle is what blocked the first attempt.
When the callee halts the active frame's halt becomes non-Running but
callStack is still non-empty — stepF's halt arm calls State.resumeByHalt
to dispatch on the callee's halt kind:
CALL-family frames (Frame.createAddr = none):
callee halt |
rule | flag | return data | world |
|---|---|---|---|---|
.Success |
resumeSuccess |
1 |
child.hReturn |
keep child's |
.Returned |
resumeSuccess |
1 |
child.hReturn |
keep child's |
.Reverted |
resumeRevert |
0 |
child.hReturn |
snapshot |
.Exception _ |
resumeException |
0 |
∅ |
snapshot |
CREATE-family frames (Frame.createAddr = some addr):
callee halt |
rule | pushed | return data | world after |
|---|---|---|---|---|
.Success |
resumeCreateSuccess |
addr |
∅ |
child + addr.code := hReturn (if `200· |
.Returned |
resumeCreateSuccess |
addr |
∅ |
child + addr.code := hReturn (if affordable; else: rollback) |
.Reverted |
resumeCreateRevert |
0 |
child.hReturn |
snapshot |
.Exception _ |
resumeCreateException |
0 |
∅ |
snapshot |
State.writeReturn copies min(retSize, hReturn.size) bytes back into the
caller's memory — and short-circuits when that count is 0, so a CALL with
retSize = 0 and a huge retOffset does not allocate memory.
An in-frame Except.error from stepF is treated as a callee-side exception
when callStack ≠ []: the run loops in Main.lean, tests/VMRunner.lean, and
tests/StateTestRunner.lean all convert it to { s with halt := .Exception e }
and re-enter stepF so resumeException fires. Only a top-frame error
(callStack = []) propagates as a top-level abort.
flowchart LR
subgraph "Specification (Prop)"
Step["Step s s'<br/>small-step relation<br/>EVM/Step.lean"]
Eval["Eval s result<br/>big-step = rtc of Step<br/>EVM/BigStep.lean"]
Step -->|"Steps = refl-trans closure,<br/>then toResult"| Eval
end
subgraph "Executable (Except)"
StepF["stepF s : Except _ State<br/>EVM/StepF.lean"]
Run["run = fuel loop<br/>Main.lean · tests/VMRunner.lean"]
StepF --> Run
end
StepF -.->|"stepF_sound:<br/>stepF s = ok s' → Step s s'<br/>EVM/Equiv.lean (no sorry)"| Step
Step(EVM/Step.lean) — a thin two-constructor wrapper around the actual per-opcode relation.Step.runningguards aStepRunningderivation withs.halt = .Running;Step.returningwraps aStepReturn(each of whose constructors pins a concrete non-Runninghalt kind and a non-emptycallStack). Splitting the running precondition out onto the wrapper is what lets the ~90StepRunningconstructors omith_runningentirely.StepRunningcarries the per-opcode logic. Each success constructor names its premises explicitly. The typical shape ish_op : s.decodedOp = some .X(wheres.decodedOp : Option Operationis the op-only projection ofs.decoded),h_gas(aNat≤against the bundledGas.<op>Total s ...total —Gas.<op>Totalpacks the static base, memory-expansion delta, and any per-word / per-byte dynamic cost into a singleNatso the same identifier appears on both sides of the rule), and anh_stackshape — but it varies:stopcarries noh_gas/h_stackand stackless reads omith_stack.pushNis the one success rule that uses the fulls.decoded, because it consumes the PUSH immediate. The post-state is a flat{ s with ... }record update; no chainedconsumeGas/replaceStackAndIncrPCcalls on the relational side.StepReturnhas the threecallReturn*resume rules.- In
StepF.lean, the executable shadow charges dynamic gas in chained form viaconsumeGas(which takes the gas-sufficiency proof as an argument so the saturating subtraction is provably safe). The bundledGas.<op>Totalof the relational rule equals the chainedstepFform byNat.sub_add_eq; the soundness proof inEquiv.leanbridges between them withsimp+grind.keccak256is declaredopaqueinCrypto/Keccak256.lean(so the relational rules are independent of any particular hash); the executable evaluator runs the real Keccak-256 thanks to a sibling@[implemented_by keccak256Impl]attribute that points at the self-contained implementation inCrypto/Keccak256.lean. Done states (halted with empty call stack) have no successors underStep(Step.not_from_done).
Eval(EVM/BigStep.lean) —Stepsis the reflexive-transitive closure;Eval s rholds whenStepsreaches a halted state whosetoResultisr.stepF(EVM/StepF.lean) — the same opcode logic as a totalExceptfunction, factored into the per-group helpers.popN/popN_correctrecover list witnesses (used bylog);chargeMem/chargeMem2apply memory-expansion gas so a huge offset hitsOutOfGasbefore any allocation.Equiv(EVM/Equiv.lean) — the bridge. 14 per-helper lemmas (stopArith_sound, …) each invert a helper'smatchand either apply the matchingStepconstructor or derive a contradiction from a.okhypothesis on an.errorpath; the headlinestepF_soundsplits on halt/decode/gas/operation and dispatches to them. Also exportsEval.halted_inv.
The vmtests executable runs the legacy ethereum/tests VMTests corpus
against stepF via its run fuel loop (cap 2_000_000):
- Parse a test JSON → build the initial
State(buildStateWith/mkAccount, hex helpers). - Run every test through
stepFwith the test's declaredexec.gasbudget — there is no pre-scan or skip filter. The four call-family opcodes (CALL/CALLCODE/DELEGATECALL/STATICCALL) are implemented in the evaluator; the separatestatetestsexe (tests/StateTestRunner.lean) exercises them against a curated 30-dir subset of the BlockchainTests GeneralStateTests (seeVMTESTS.mdfor the list). - Run to a halt, then compare (
cmpAccounts) storage, return-data, balance, and nonce against the expected post-state, producing anOutcome(pass/fail/skip/incon/crash). - Aggregate into a
Tally.runDirfans the files out across in-process LeanTaskworkers (IO.asTask,-j/VMTESTS_JOBS, default 8) — there is no subprocess isolation, so a worker that throws is onecrashbut a hard panic aborts the whole run;--filemode runs a single test in its own process for manual isolation.
CI runs the full suite non-gating against .github/vmtests-baseline.txt. See
VMTESTS.md for results, gas-mode details, and the known evaluator gaps the
suite surfaces.