Skip to content

feat(langchain-instrumentation): Add GraphInterrupttoIGNORED_EXCEPTION_PATTERNS #3231

Description

@AlexZKR

Is your feature request related to a problem? Please describe.

Yes. When utilizing LangGraph's native human-in-the-loop features (like interrupt()) to handle user suggestions, confirmations, or approvals, LangGraph raises a GraphInterruptexception to unwind the execution stack and pause the graph.

Because openinference-instrumentation-langchainintercepts all escaping exceptions generically, it flags these intentional control-flow breaks as application errors ( StatusCode.ERROR). In production configurations where interactive suggestions or human reviews occur frequently, this fills Arize Phoenix dashboards with false-positive error logs, ruins baseline alerting/success metrics, and masks actual system failures.

Describe the solution you'd like

Add r"^GraphInterrupt("to the IGNORED_EXCEPTION_PATTERNScompilation array inside openinference/instrumentation/langchain/_tracer.py.

This directly mirrors the excellent design pattern introduced in version v0.1.37for Command(and ParentCommand(, allowing expected framework lifecycle events to cleanly resolve with a span status of StatusCode.OKinstead of triggering an operational error alert.

Proposed addition to openinference/instrumentation/langchain/_tracer.py

IGNORED_EXCEPTION_PATTERNS = [
r"^Command(",
r"^ParentCommand(",
r"^GraphInterrupt(", # <--- Support native LangGraph interrupts cleanly
]
Describe alternatives you've considered

Monkey-Patching : Dynamically appending the regex to the internal OpenInference array at application initialization ( from openinference.instrumentation.langchain._tracer import IGNORED_EXCEPTION_PATTERNS). While this functions as an interim patch, it relies on mutating non-public implementation details that could break across minor library updates.

Span Interception WRappers : Manually wrapping every single graph execution loop block, catching the exception, updating the Otel span status context fields manually, and bubbling it back up. This introduces substantial boilerplate architectural debt to application layers.

Additional context

This request addresses identical telemetry pain points to the Commandlifecycle patches. Below is an example payload layout of how the exception string is currently formatted when captured by the auto-instrumentation engine:

JSON
{
"exception.type": "langgraph.errors.GraphInterrupt",
"exception.escaped": "False",
"exception.message": "(Interrupt(value={'type': 'ask_options', 'options': [...], 'question': '...'}, id='...'),)",
"exception.stacktrace": "Traceback (most recent call last):\n File ".../langgraph/_internal/_runnable.py", line 733, in ainvoke\n ... langgraph.errors.GraphInterrupt"
}

link to the original question: Arize-ai/phoenix#13677 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions