Skip to content

BPF demux runtime loader: aya 0.13 SkReuseport program type gap #100

Description

@fabriziosalmi

Context

Follow-up to #53. The v0.2-era foundation work shipped:

What's NOT yet wired: the userspace runtime that loads the eBPF program from disk and attaches it via setsockopt(SO_ATTACH_REUSEPORT_EBPF). The blocker is upstream:

The blocker

aya (the BPF runtime crate Zion already uses for XDP) recognises BPF_PROG_TYPE_SK_REUSEPORT in its programs/info.rs bpf_prog_type enum but does not ship a typed program helper for it. The programs/ directory has xdp.rs, sk_lookup.rs, sk_skb.rs, sk_msg.rs, … and no sk_reuseport.rs.

Concretely, this code does NOT compile against aya 0.13:

let mut bpf = Ebpf::load_file(\"bpf/zion-bpf-demux/.../zion-bpf-demux\")?;
let prog: &mut SkReuseport = bpf  // <-- aya::programs::SkReuseport doesn't exist
    .program_mut(\"zion_bpf_demux\")
    .ok_or(...)
    .try_into()?;
prog.load()?;
prog.attach(socket_fd)?;

Program::load() on the untyped handle sets the wrong expected_attach_type, so the kernel rejects at bpf(BPF_PROG_LOAD) time even before we'd reach setsockopt.

Three viable paths

  1. Upstream contribution to aya — add programs::SkReuseport mirroring the existing SkLookup / SkSkb / SkMsg helpers (~50 lines of Rust). Right shape; cleanest integration.
  2. Switch BPF runtime to libbpf-rs — has full SK_REUSEPORT support today. Adds a second BPF runtime to Zion's dependency closure (XDP path stays on aya). Not preferred.
  3. Hand-roll bpf(BPF_PROG_LOAD) + setsockopt — pure-libc, no high-level helpers. ~150 lines of unsafe FFI plus our own ELF section parsing. Doable but bypasses exactly what aya exists to encapsulate.

Recommended: (1).

Acceptance for closing this issue

  • aya gains programs::SkReuseport (whether by upstream PR or a thin local shim that mirrors aya's typed-program convention)
  • src/bpf_demux.rs exposes attach_to_listener(listener_raw_fd) -> Result<...>:
    • Ebpf::load_file(crate::bpf_demux::object_path())
    • program.try_into::<SkReuseport>() + program.load()
    • setsockopt(SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &program_fd) on the listener
    • Boot log: bpf_demux=attached fd=N on success / structured warn + auto-fallback on any failure
  • main.rs: when cfg!(feature = \"bpf-demux\") && DemuxReadiness::Ready, call attach_to_listener after the HTTPS bind. Failure NEVER aborts boot — falls through to kernel default reuseport hash.
  • bpftool prog list on a deployed instance shows the loaded program

Out of scope (further v3 work)

  • Multi-socket SO_REUSEPORT bind for true worker affinity — the demux meaningfully routes only when there are N>1 sockets in the group. Today's single-socket bind makes the program a no-op even when attached. Multi-socket bind is a separate refactor of main.rs's HTTPS accept loop (touching both the tokio path and the io_uring-accept path).
  • Real demux body (replacing SK_PASS with NUMA-aware / hash-based routing) — depends on multi-socket bind landing first.
  • Bench --features bpf-demux showing no regression on the TCP-only workload — depends on the runtime path being live.

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestmeshAIMP-based federated signalsrustPull requests that update rust code

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions