GenAI semconv needs vendor-neutral reasoning parts and continuity tokens

### Area(s)

area:gen-ai

### What's missing?

Reasoning-capable models now return more than assistant text and tool calls.

They can return ordered reasoning or thinking parts, plus opaque continuity values that the next request must carry forward. Those values are not ordinary metadata. They are part of the provider protocol for continuing a reasoning turn, especially when reasoning happens before a tool call.

Today, the GenAI semantic conventions mostly model assistant output as text plus tool calls. That leaves no vendor-neutral place to represent:

- a reasoning/thinking part in the assistant message
- the original order between reasoning, text, and tool calls
- opaque continuity tokens or signatures attached to those parts
- signatures attached directly to a tool call
- reasoning-specific token accounting and request-side reasoning controls

The practical problem is replay.

A trace can show that a model called a tool, and it can show the tool result, but it may not contain enough information to reconstruct the next provider request. If a tool runner rebuilds the next turn from only the tool call and tool result, it can silently drop the reasoning part that preceded the tool call.

That makes the trace incomplete in a way that matters operationally. You cannot reliably replay the conversation from telemetry, compare provider behavior, debug a failed tool loop, or prove what was sent on turn N+1 from the trace of turn N.

This is not about standardizing or exposing private chain-of-thought text. It is about preserving the provider-visible message structure and the opaque continuity values required to continue the conversation, subject to normal redaction and privacy policy.

### Cross-vendor pattern

The providers use different names and wire formats, but the shape is similar.

**OpenAI Responses API**

- Assistant output can include an `output[]` item with `type: "reasoning"`.
- Reasoning continuity can be carried by `encrypted_content` and the reasoning item identity.
- Reasoning token usage can be reported separately, for example in `output_tokens_details.reasoning_tokens`.
- Request-side controls include fields such as `reasoning.effort` and reasoning summary settings.

**Anthropic Messages API**

- Assistant content can include `thinking` and `redacted_thinking` blocks.
- Continuity is represented through values such as `signature` for visible thinking, or redacted data for redacted thinking.
- Thinking tokens are counted in output tokens, but are not always exposed as a separate usage bucket.
- Request-side controls include thinking type, token budget, and display settings.

**Google Gemini API**

- Response `parts[]` can include thinking parts, for example with `thought: true`.
- Continuity is represented with `thoughtSignature`.
- The signature can be attached to a data-bearing part, including a `functionCall`, so this cannot be modeled only as metadata on a reasoning text block.
- Usage can include `thoughtsTokenCount`.
- Request-side controls include thinking budget, thinking level, and whether thoughts are included.

The common failure mode is the same: a reasoning/thinking part immediately precedes a tool call, but the next request is rebuilt from only the tool call and tool result. Depending on the provider and model, dropping that reasoning continuity can cause a hard request error or a lower-quality continuation because the model has to reason again from an incomplete context.

### Describe the solution you'd like

Add a cross-vendor abstraction for reasoning content and reasoning continuity in GenAI semantic conventions.

A useful model would cover:

1. **Reasoning as an ordered message part**

   Add a message part type for reasoning/thinking content that can be interleaved with text, tool calls, and tool results. The original provider order should be recoverable from telemetry.

2. **Continuity tokens and signatures**

   Provide a place to record the presence and metadata for opaque continuity values such as OpenAI `encrypted_content`, Anthropic `signature` / redacted data, and Gemini `thoughtSignature`.

   The conventions should distinguish between values that carry hidden state and values that authenticate visible reasoning text, even if both need to be preserved by the client.

3. **Tool-call-attached continuity**

   Allow continuity values to attach to a tool call part, not only to a reasoning part. Gemini's `thoughtSignature` on a `functionCall` is the motivating example.

4. **Replay guidance**

   Define what it means for a trace to be replayable enough to reconstruct the next provider request. At minimum, replay needs the original part order and any provider-required continuity fields, when recording those fields is allowed by policy.

   If payload capture is disabled, the trace should still be able to say that a continuity value was present, how large it was, and that full replay is intentionally not possible from the redacted trace.

5. **Reasoning token accounting**

   Standardize a reasoning/thinking token breakdown where providers expose it, while documenting that some providers include thinking in output tokens without a separate bucket.

6. **Request-side reasoning config**

   Record the client's requested reasoning configuration: effort, budget, level, display/include-thoughts options, or the closest provider-specific equivalent.

7. **Privacy and redaction guidance**

   Continuity values can be large, opaque, and state-bearing. The default guidance should favor recording presence, length, type, and attachment point rather than raw bytes. If an instrumentation does capture payloads for replay, that should be opt-in and governed by the same masking rules as other sensitive GenAI content.

### Why this matters

This gap shows up in ordinary tool-calling systems, not only in specialized research setups.

A common production pattern is:

1. call the model
2. receive assistant reasoning plus a tool call
3. execute the tool
4. send the tool result back to the model

For reasoning models, step 4 may need more than the tool result. It may also need the reasoning or continuity part from step 2. If telemetry drops that part, the trace no longer represents what the application needed to send.

That hurts several practical workflows:

- debugging rejected follow-up requests
- replaying incidents from traces
- comparing behavior across providers
- migrating tool loops from one provider API to another
- verifying that instrumentation preserved the full assistant turn
- evaluating quality regressions when reasoning continuity was accidentally dropped

### Related work

This overlaps with, but is broader than:

- #32, which tracks additional `MessagePart` types
- #76, which tracks detailed token usage metrics
- #189, which tracks request-side reasoning effort

OpenInference has also prototyped this across OpenAI, Anthropic, and Gemini: capture the assistant turn into span attributes, reconstruct the next turn from those attributes, and include negative checks that remove the continuity value to prove that it is load-bearing.

I would be happy to contribute the vendor comparison and prototype findings to a GenAI SIG discussion or a spec PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GenAI semconv needs vendor-neutral reasoning parts and continuity tokens #192

Area(s)

What's missing?

Cross-vendor pattern

Describe the solution you'd like

Why this matters

Related work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

GenAI semconv needs vendor-neutral reasoning parts and continuity tokens #192

Description

Area(s)

What's missing?

Cross-vendor pattern

Describe the solution you'd like

Why this matters

Related work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions