This document is the first actual implementation blueprint for state_collapser.
It is derived from the accumulated design decisions recorded across:
module_design_desiderata.mdpackage_best_practices_proposal.mdreward_locality_for_quotient_training.mdimplementation_contracts.mdmathematical_model.tex
Those earlier documents preserve the history of design meetings and clarifications.
This document is different.
Its purpose is to state, in one place and at one level of detail, the concrete architecture that should be implemented first.
It should be treated as the current authoritative blueprint for the first implementation pass.
In software development, the closest standard terms for "blueprint" are:
- technical design specification
- architecture specification
- implementation specification
For this repository, the best term is:
- implementation blueprint
because the document must bridge mathematical design, architecture, runtime behavior, and concrete code scaffolding.
This blueprint is for the first implementation pass only.
It covers:
- the hidden graph contract
- the discovered graph and vista layers
- message passing for quotient-style skip knowledge
- contraction policies
- nested cumulative quotient-tier views
- full tower update after each newly discovered state
- reward computation and quotient-level reward aggregation
- runtime snapshots
- a first toy environment
It does not cover:
- full higher-categorical or simplicial path machinery
- final production optimization
- full ROS 2 integration
- final trainer zoo
- universal support for strongly nonlocal reward rules
state_collapser is not primarily:
- a Gymnasium wrapper
- a graph-algorithms utility package
- a Torch-first learning library
- a robotics middleware package
It is primarily:
- a runtime graph-discovery and annotation engine
- over a hidden predicate-defined state-action graph
- with nested quotient-style views
- supporting tower construction during training
The package's core operational surface is:
- an agent moves by primitive actions in the base hidden graph
- the explored graph is updated
- the
1-hop vista is updated - contraction-policy consequences are applied
- the full current tower is updated immediately
- the agent's current position is known across all tiers
- reward is computed from the currently understood tower
gymnasium is the environment-shell boundary.
It provides:
resetstep- observation
- action
- reward
- termination and truncation
It does not provide:
- discovered graph
G^0_t - explored subgraph
H^0_t 1-hop vista graph- contraction/equalization
- quotient views
- tier nesting
- full tower update
- cross-tier position tracking
Therefore the package should use Gymnasium where convenient, but the graph/tower runtime is package-native.
torch is an optional learnable-backend surface.
The first implementation should not require it for:
- graph legality
- explored/vista updates
- contraction policies
- reward aggregation
- runtime tower maintenance
It may later support:
- learned embeddings
- learned contraction scoring
- learned policy heads
- accelerated reward/message computations
ROS 2 remains outside the first implementation scope.
The first implementation should not embed ROS runtime assumptions into core classes.
The following decisions are now fixed for the first implementation pass.
Hidden Graph
The hidden graph is predicate-defined.
It is not the primary source of truth as a generic pre-materialized graph object.
Its truth conditions come from:
- state validity
- action validity
- edge validity
- action application
The base state is:
- a minimal mathematical locus
- immutable
- hashable
- represented first as a frozen dataclass
The state should not itself store changing runtime notes or contraction annotations.
A primitive action is:
- immutable
- hashable
- explicit
- one-step or indecomposable
The contract must support multiple action-encoding styles.
Changing runtime data lives outside the base state.
It lives in node- and tier-indexed annotation stores attached to runtime graph layers.
Contraction is an information-equalization operation.
What is equalized is not merely graph topology.
What is equalized is outgoing-edge possibility knowledge and related local annotations needed for quotient-style skip behavior.
Quotient tiers are not isolated graph worlds.
They are nested cumulative quotient views over one shared discovered graph.
The crucial semantic rule is:
- outgoing-edge and coset information accumulates down the tower
- tier queries are cumulative in a "
<= k" sense, not merely exact-tier-only
Reward is local to primitive transitions in the implementation target region.
Compound reward is cumulative over local primitive-action reward contributions.
Quotient-edge reward aggregation must ignore internal collapsed edges and use only boundary-crossing preimage contributors.
The first implementation should operate with the following runtime objects.
Layer 1: Hidden Graph Contract
Objects:
StatePrimitiveActionBaseEdgeGraphSpecHiddenGraph
Purpose:
- answer legality and local-neighborhood questions
- define the hidden graph intensionally
Objects:
ExploredGraphVistaGraphNodeAnnotationStoreVistaPayloadLocalStar
Purpose:
- represent visited history
- represent the maintained
1-hop fringe - store state-local and tier-local runtime annotations
- support push/pull message passing
Objects:
QuotientTierViewProjectionMapCosetStoreTierAnnotationStore
Purpose:
- represent tier-indexed quotient views over the shared discovered graph
- support cumulative outgoing-edge queries by tier
- track current tier positions
Objects:
ContractionPolicyTowerRuntimeRuntimeSnapshot- optional
TrustworthinessPolicy
Purpose:
- apply contraction policy after each newly discovered step
- update the full tower immediately
- produce the runtime handoff object for downstream training/control logic
Objects:
GymnasiumAdapter- toy environment implementation
Purpose:
- expose the package runtime through standard RL entry points
- keep adapter code thinner than the runtime itself
Required properties:
- immutable
- hashable
- serializable
- stable equality
Required fields:
- stable identity field or canonical payload
- environment-specific structured payload
Examples of payload content:
- discretized position
- arm-joint coordinates
- bounded symbolic fields
- region ids
- categorical markers
Required properties:
- immutable
- hashable
- serializable
Required fields:
- action identity or canonical payload
- encoded action content
- optional labels
Required fields:
sourceactiontarget- optional labels
This is a central performance-sensitive runtime object.
It should store:
- local labels
- local notes/marks
- pushed vista payloads
- pulled vista payloads
- tier-local contraction marks
- tier-indexed cumulative outgoing-edge knowledge
It should support:
- rapid local update
- rapid cumulative tier query
- clean separation from state identity
It should not require labels to be numeric.
However, the implementation should leave room for later numeric encoding layers used for vectorized or GPU-backed optimization.
This must be interpreted operationally.
It is not generic context.
It carries newly available outgoing-edge possibility data.
Required contents:
- outgoing edge set from the sending state
- corresponding reachable target states
- relevant edge labels
- contraction or coset marks needed to interpret those outgoing possibilities
The payload may later be optimized into encoded numerical views, but the logical contents should remain as above.
Hidden Graph Contract
The hidden graph should expose the following behavior:
is_valid_stateis_valid_actionapply_actionis_valid_edgeout_actionsout_neighborsout_edges
The implementation should treat this as the mathematical source of legality and local query, not as a cache of precomputed adjacency data.
Materialization of all nodes or edges is optional and environment-dependent.
ExploredGraph stores:
- visited states
- traversed edges
- current base state
- current base path history
Its semantics are:
- the visited-history restriction of the hidden graph
VistaGraph stores:
- explored graph contents or references
1-hop queried neighborhoods at visited states- node annotations
- payload propagation results
Its semantics are:
- the explored graph plus the maintained local visible fringe
The package should treat the vista layer as the first genuinely operational graph layer.
This is a nested cumulative quotient view over the same discovered graph.
It should store:
- tier index
- source-layer reference
- projection maps
- coset membership
- current tier position
- tier-local cumulative outgoing-edge knowledge
The crucial rule is:
- the effective outgoing-edge query at tier
kis cumulative, not exact-tier-isolated
So the implementation must support queries like:
- "what outgoing-edge possibilities are available through tier
k?"
not merely:
- "what outgoing edges were introduced exactly at tier
k?"
The first implementation should treat the runtime update cycle as the core algorithm.
One primitive environment step means:
- execute primitive action in the base environment
- decode and validate resulting base transition
- add state and edge to explored graph
- refresh
1-hop vista at the newly occupied state - build the local star at tier
0 - apply the selected contraction policy at tier
0 - update tier-
0annotations and quotient view - propagate the contraction consequences through every higher tier immediately
- update the current position at every tier
- recompute cumulative outgoing-edge knowledge where affected
- compute current step reward
- update cumulative base-path reward
- update quotient-tier reward summaries using boundary-crossing preimage edges only
- produce a runtime snapshot
There should be no ambiguity in code about whether the full tower update happens now or later.
It happens now.
The first implementation should ship with both:
- a label-based policy
- a seeded-random policy
Both should implement the same interface:
- select an edge family from a local star
The runtime, not the policy, performs quotient/tower mutation.
Intended use:
- where contraction labels are present and meaningful
- including region-specific parameter-reduction-like situations
Intended use:
- general problem-agnostic contraction
- initial baseline behavior
The seeded-random implementation should be deterministic under fixed seed.
The first implementation must support primitive transition reward.
The first implementation must support cumulative path reward via weighted aggregation over primitive rewards.
The first implementation must support quotient-tier reward summaries defined from preimage contributors.
Important rule:
- internal collapsed member edges do not count as outgoing quotient-edge reward contributors
- only boundary-crossing preimage edges do
The runtime should at minimum expose:
- step reward
- cumulative path reward
- quotient-tier reward summaries
These should be present in the runtime snapshot, not reconstructed ad hoc later.
The review process clarified that trustworthiness is not central to version 1.
Reason:
- the full tower is updated immediately after each newly discovered state
- episode progression already proceeds bottom-tier to top-tier via lifts
So the first implementation should:
- keep
TrustworthinessPolicyas an extension point - not make it a central runtime dependency
This is a simplification, not a loss.
TowerRuntime.step(...) should return a full RuntimeSnapshot.
The snapshot should contain:
- current base state
- explored graph
- vista graph
- ordered quotient-tier views
- current position at every tier
- current step reward
- cumulative path reward
- quotient-tier reward summaries
- any needed metadata for downstream control/training
The package's own runtime should not reduce this to Gymnasium-style tuples internally.
Adapters may derive thinner views later.
The first toy environment should be:
- small
- inspectable
- structurally honest
- close in spirit to a real robotics constraint-navigation problem
Recommended direction:
- a discretized robot-arm-style constraint navigation environment
Required properties:
- hidden graph is not trivial to parametrize globally
- some regions admit local parameter-reduction-style labels
- some regions do not
- random contraction is meaningful
- label-based contraction is meaningful
- local reward on primitive transitions is meaningful
This environment should become:
- the first end-to-end example
- the first integration-test environment
- the first explanation artifact for future users
The initial code layout should implement the runtime ontology above.
Recommended layout:
src/state_collapser/
core/
state.py
action.py
edges.py
labels.py
annotations.py
rewards.py
graph/
spec.py
hidden_graph.py
explored_graph.py
vista_graph.py
local_star.py
quotient/
projection.py
cosets.py
tier_view.py
contract/
policy.py
selection.py
tower/
runtime.py
snapshot.py
trustworthiness.py
adapters/
gymnasium.py
examples/
robot_constraint_toy.py
This layout should be treated as the first implementation target unless a later explicit decision revises it.
The first implementation must be driven by unit tests from the beginning.
At minimum, tests must cover:
- state immutability and hashing
- primitive action immutability and hashing
- hidden graph legality predicates
- explored graph state/edge accumulation
- vista
1-hop refresh behavior - push/pull payload propagation
- node annotation store cumulative query semantics
- label-based contraction policy
- random contraction policy under fixed seed
- quotient projection correctness
- cumulative nested tier query semantics
- full tower update after new discovered state
- exclusion of internal collapsed edges from quotient reward aggregation
- runtime snapshot completeness
- toy-environment integration smoke behavior
The earlier design docs remain valid, but this blueprint sharpens them in the following ways:
gymnasiumshould now be read strictly as an environment shell- quotient tiers should now be read as nested cumulative views
- message passing should now be read as outgoing-edge possibility propagation
- trustworthiness should now be read as optional in version
1
The first implementation of state_collapser should be built as a realtime nested graph-discovery, annotation, and quotient-view engine over a shared discovered graph, with immediate tower updates after each new discovered state, cumulative tier-indexed outgoing-edge queries, local primitive reward, quotient reward aggregation over boundary-crossing contributors, and a full runtime snapshot handoff after each step.