Area(s)
area:gen-ai
What's missing?
Reasoning-capable models now return more than assistant text and tool calls.
They can return ordered reasoning or thinking parts, plus opaque continuity values that the next request must carry forward. Those values are not ordinary metadata. They are part of the provider protocol for continuing a reasoning turn, especially when reasoning happens before a tool call.
Today, the GenAI semantic conventions mostly model assistant output as text plus tool calls. That leaves no vendor-neutral place to represent:
- a reasoning/thinking part in the assistant message
- the original order between reasoning, text, and tool calls
- opaque continuity tokens or signatures attached to those parts
- signatures attached directly to a tool call
- reasoning-specific token accounting and request-side reasoning controls
The practical problem is replay.
A trace can show that a model called a tool, and it can show the tool result, but it may not contain enough information to reconstruct the next provider request. If a tool runner rebuilds the next turn from only the tool call and tool result, it can silently drop the reasoning part that preceded the tool call.
That makes the trace incomplete in a way that matters operationally. You cannot reliably replay the conversation from telemetry, compare provider behavior, debug a failed tool loop, or prove what was sent on turn N+1 from the trace of turn N.
This is not about standardizing or exposing private chain-of-thought text. It is about preserving the provider-visible message structure and the opaque continuity values required to continue the conversation, subject to normal redaction and privacy policy.
Cross-vendor pattern
The providers use different names and wire formats, but the shape is similar.
OpenAI Responses API
- Assistant output can include an
output[] item with type: "reasoning".
- Reasoning continuity can be carried by
encrypted_content and the reasoning item identity.
- Reasoning token usage can be reported separately, for example in
output_tokens_details.reasoning_tokens.
- Request-side controls include fields such as
reasoning.effort and reasoning summary settings.
Anthropic Messages API
- Assistant content can include
thinking and redacted_thinking blocks.
- Continuity is represented through values such as
signature for visible thinking, or redacted data for redacted thinking.
- Thinking tokens are counted in output tokens, but are not always exposed as a separate usage bucket.
- Request-side controls include thinking type, token budget, and display settings.
Google Gemini API
- Response
parts[] can include thinking parts, for example with thought: true.
- Continuity is represented with
thoughtSignature.
- The signature can be attached to a data-bearing part, including a
functionCall, so this cannot be modeled only as metadata on a reasoning text block.
- Usage can include
thoughtsTokenCount.
- Request-side controls include thinking budget, thinking level, and whether thoughts are included.
The common failure mode is the same: a reasoning/thinking part immediately precedes a tool call, but the next request is rebuilt from only the tool call and tool result. Depending on the provider and model, dropping that reasoning continuity can cause a hard request error or a lower-quality continuation because the model has to reason again from an incomplete context.
Describe the solution you'd like
Add a cross-vendor abstraction for reasoning content and reasoning continuity in GenAI semantic conventions.
A useful model would cover:
-
Reasoning as an ordered message part
Add a message part type for reasoning/thinking content that can be interleaved with text, tool calls, and tool results. The original provider order should be recoverable from telemetry.
-
Continuity tokens and signatures
Provide a place to record the presence and metadata for opaque continuity values such as OpenAI encrypted_content, Anthropic signature / redacted data, and Gemini thoughtSignature.
The conventions should distinguish between values that carry hidden state and values that authenticate visible reasoning text, even if both need to be preserved by the client.
-
Tool-call-attached continuity
Allow continuity values to attach to a tool call part, not only to a reasoning part. Gemini's thoughtSignature on a functionCall is the motivating example.
-
Replay guidance
Define what it means for a trace to be replayable enough to reconstruct the next provider request. At minimum, replay needs the original part order and any provider-required continuity fields, when recording those fields is allowed by policy.
If payload capture is disabled, the trace should still be able to say that a continuity value was present, how large it was, and that full replay is intentionally not possible from the redacted trace.
-
Reasoning token accounting
Standardize a reasoning/thinking token breakdown where providers expose it, while documenting that some providers include thinking in output tokens without a separate bucket.
-
Request-side reasoning config
Record the client's requested reasoning configuration: effort, budget, level, display/include-thoughts options, or the closest provider-specific equivalent.
-
Privacy and redaction guidance
Continuity values can be large, opaque, and state-bearing. The default guidance should favor recording presence, length, type, and attachment point rather than raw bytes. If an instrumentation does capture payloads for replay, that should be opt-in and governed by the same masking rules as other sensitive GenAI content.
Why this matters
This gap shows up in ordinary tool-calling systems, not only in specialized research setups.
A common production pattern is:
- call the model
- receive assistant reasoning plus a tool call
- execute the tool
- send the tool result back to the model
For reasoning models, step 4 may need more than the tool result. It may also need the reasoning or continuity part from step 2. If telemetry drops that part, the trace no longer represents what the application needed to send.
That hurts several practical workflows:
- debugging rejected follow-up requests
- replaying incidents from traces
- comparing behavior across providers
- migrating tool loops from one provider API to another
- verifying that instrumentation preserved the full assistant turn
- evaluating quality regressions when reasoning continuity was accidentally dropped
Related work
This overlaps with, but is broader than:
OpenInference has also prototyped this across OpenAI, Anthropic, and Gemini: capture the assistant turn into span attributes, reconstruct the next turn from those attributes, and include negative checks that remove the continuity value to prove that it is load-bearing.
I would be happy to contribute the vendor comparison and prototype findings to a GenAI SIG discussion or a spec PR.
Area(s)
area:gen-ai
What's missing?
Reasoning-capable models now return more than assistant text and tool calls.
They can return ordered reasoning or thinking parts, plus opaque continuity values that the next request must carry forward. Those values are not ordinary metadata. They are part of the provider protocol for continuing a reasoning turn, especially when reasoning happens before a tool call.
Today, the GenAI semantic conventions mostly model assistant output as text plus tool calls. That leaves no vendor-neutral place to represent:
The practical problem is replay.
A trace can show that a model called a tool, and it can show the tool result, but it may not contain enough information to reconstruct the next provider request. If a tool runner rebuilds the next turn from only the tool call and tool result, it can silently drop the reasoning part that preceded the tool call.
That makes the trace incomplete in a way that matters operationally. You cannot reliably replay the conversation from telemetry, compare provider behavior, debug a failed tool loop, or prove what was sent on turn N+1 from the trace of turn N.
This is not about standardizing or exposing private chain-of-thought text. It is about preserving the provider-visible message structure and the opaque continuity values required to continue the conversation, subject to normal redaction and privacy policy.
Cross-vendor pattern
The providers use different names and wire formats, but the shape is similar.
OpenAI Responses API
output[]item withtype: "reasoning".encrypted_contentand the reasoning item identity.output_tokens_details.reasoning_tokens.reasoning.effortand reasoning summary settings.Anthropic Messages API
thinkingandredacted_thinkingblocks.signaturefor visible thinking, or redacted data for redacted thinking.Google Gemini API
parts[]can include thinking parts, for example withthought: true.thoughtSignature.functionCall, so this cannot be modeled only as metadata on a reasoning text block.thoughtsTokenCount.The common failure mode is the same: a reasoning/thinking part immediately precedes a tool call, but the next request is rebuilt from only the tool call and tool result. Depending on the provider and model, dropping that reasoning continuity can cause a hard request error or a lower-quality continuation because the model has to reason again from an incomplete context.
Describe the solution you'd like
Add a cross-vendor abstraction for reasoning content and reasoning continuity in GenAI semantic conventions.
A useful model would cover:
Reasoning as an ordered message part
Add a message part type for reasoning/thinking content that can be interleaved with text, tool calls, and tool results. The original provider order should be recoverable from telemetry.
Continuity tokens and signatures
Provide a place to record the presence and metadata for opaque continuity values such as OpenAI
encrypted_content, Anthropicsignature/ redacted data, and GeminithoughtSignature.The conventions should distinguish between values that carry hidden state and values that authenticate visible reasoning text, even if both need to be preserved by the client.
Tool-call-attached continuity
Allow continuity values to attach to a tool call part, not only to a reasoning part. Gemini's
thoughtSignatureon afunctionCallis the motivating example.Replay guidance
Define what it means for a trace to be replayable enough to reconstruct the next provider request. At minimum, replay needs the original part order and any provider-required continuity fields, when recording those fields is allowed by policy.
If payload capture is disabled, the trace should still be able to say that a continuity value was present, how large it was, and that full replay is intentionally not possible from the redacted trace.
Reasoning token accounting
Standardize a reasoning/thinking token breakdown where providers expose it, while documenting that some providers include thinking in output tokens without a separate bucket.
Request-side reasoning config
Record the client's requested reasoning configuration: effort, budget, level, display/include-thoughts options, or the closest provider-specific equivalent.
Privacy and redaction guidance
Continuity values can be large, opaque, and state-bearing. The default guidance should favor recording presence, length, type, and attachment point rather than raw bytes. If an instrumentation does capture payloads for replay, that should be opt-in and governed by the same masking rules as other sensitive GenAI content.
Why this matters
This gap shows up in ordinary tool-calling systems, not only in specialized research setups.
A common production pattern is:
For reasoning models, step 4 may need more than the tool result. It may also need the reasoning or continuity part from step 2. If telemetry drops that part, the trace no longer represents what the application needed to send.
That hurts several practical workflows:
Related work
This overlaps with, but is broader than:
MessageParttypes #32, which tracks additionalMessageParttypesgen_ai.request.reasoning_effort#189, which tracks request-side reasoning effortOpenInference has also prototyped this across OpenAI, Anthropic, and Gemini: capture the assistant turn into span attributes, reconstruct the next turn from those attributes, and include negative checks that remove the continuity value to prove that it is load-bearing.
I would be happy to contribute the vendor comparison and prototype findings to a GenAI SIG discussion or a spec PR.