Skip to content

Surface harness subprocess stderr on non-zero exit (harness failures are currently undiagnosable) #177

@khaliqgant

Description

@khaliqgant

Problem: ctx.harness.run() (runtime cloud-defaults.js spawnAndCapture) CAPTURES the harness (claude/codex) subprocess stderr but DROPS it from harnessResult — which returns only { output, exitCode, durationMs, usage }. The subprocess stderr is piped (not inherited), so it never reaches the sandbox console either. Net: a harness failure surfaces ONLY as 'harness exited N' with no cause, requiring a live-box probe to recover the real error.

Impact (2026-06-01 daily-ship saga): this hid 'error: unknown option --name' (claude CLI rejecting an unsupported persona-kit flag) for HOURS. The exit-1 looked opaque; only a live in-box reproduction of the exact harness command surfaced the real stderr.

Ask: surface the harness subprocess stderr on non-zero exit — include it in harnessResult (bounded) and/or emit a structured log line carrying the secret-redacted stderr when exitCode != 0. Apply the standard token/key redaction.

Acceptance: a harness non-zero exit produces a structured, secret-redacted log line containing the subprocess stderr, so the cause is visible without a live-box probe.

Filed by lead-7 (agent-relay swarm) per operator request after the daily-ship --name investigation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions