Add Agent Framework reference scenario#325
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new Python reference scenario for the Microsoft Agent Framework, exercising its native GenAI telemetry (agent invocation, inference, and tool execution) against the repo’s local OpenAI-compatible mock server. It also extends the mock server’s /responses behavior to support Responses-style tool calling and usage detail fields so the Agent Framework can emit current GenAI usage attributes, and refreshes the generated reference reports/README.
Changes:
- Added
reference/scenarios/agent-framework/(scenario script, dependency metadata, lockfile, and generateddata.json) to capture native Agent Framework telemetry. - Extended the mock OpenAI Responses endpoint to return function-call outputs and richer usage details (
cached_tokens,reasoning_tokens). - Refreshed generated coverage reports and the reference README to include
agent-frameworkas a supporting library.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| reference/src/semconv_genai/mock_server/openai.py | Adds Responses usage details and a tool-call response path for /responses. |
| reference/src/semconv_genai/mock_server/_common.py | Broadens tool-argument schema extraction to support additional tool shapes. |
| reference/scenarios/agent-framework/scenario.py | New scenario exercising Agent Framework native telemetry for agent runs, inference, and tools. |
| reference/scenarios/agent-framework/pyproject.toml | Defines dependencies for the new Agent Framework scenario. |
| reference/scenarios/agent-framework/uv.lock | Locks scenario dependencies for reproducible runs. |
| reference/scenarios/agent-framework/data.json | Captured span→attribute coverage emitted by the scenario run. |
| reference/reports/invoke-agent-internal-span.md | Regenerated report including agent-framework coverage. |
| reference/reports/inference-span.md | Regenerated report including agent-framework coverage (incl. usage detail attrs). |
| reference/reports/execute-tool-span.md | Regenerated report including agent-framework tool execution coverage. |
| reference/README.md | Updates the scenario/report index to list agent-framework as supported. |
|
Copilot has reviewed this PR. Copilot's suggestions aren't always correct or applicable, so please evaluate each comment on its merits and then handle it in one of these ways:
Automation flags a PR for human review once every Copilot comment has a reply or is marked as resolved, so keeping these threads up to date helps reviewers know when the PR is ready. Status across open PRs is visible on the pull request dashboard. |
lmolkova
left a comment
There was a problem hiding this comment.
Thanks! It seems ADK does not produce the workflow span, but produces inference ones. Inference span would likely be duplicated by the underlying openai instrumentation, we should try to avoid emitting them.
Add native Agent Framework reference coverage using OpenAIChatClient against the mock OpenAI Responses endpoint. Extend the mock Responses endpoint with function-call and usage-detail payloads needed by Agent Framework native telemetry, then refresh the reference reports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the OpenAI Responses mock robust to alternate input and tool shapes while preserving native Agent Framework coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the mock Responses tool-call branch scoped to local tool execution so Azure AI Foundry hosted-agent requests continue returning final message output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4e4166a to
da9d042
Compare
Add native chat-client tool-call and chat-completions agent paths. The chat-completions agent path covers additional request options emitted by Agent Framework native telemetry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@lmolkova thanks, I fixed the pipeline and noticed some things missing, should be good now. |
|
thanks @eavanvalkenburg ! JFYI - I'm proposing to remove all inference manual instrumentation from reference scenarios here #351 with some principles outlined in AGENTS.md - would love to know if you have any thoughts or concerns Non-editorial contributions need two approvals, given it's documenting what Agent Framework de-facto does, I'm going to consider this PR to be editorial and merge it. Thanks! |
|
@lmolkova i do have some concern, as far as I'm aware, we instrument everything in Agent framework and then we do use openai or Anthropic or others to call models, but I've never seen them emitting things, likely those libraries do not emit by default (I might have seen logs, but def not spans and metrics). So this change would require all libraries like ours to figure out for each provider how to set that up, or make the dev ex for our customers worse because they have to do two instrumentation setups, and if they want to switch provider it goes from a single line change up multiple updates to still get otel. So while I can follow the thinking in theory, make each thing responsible for its own features, the practice is not as simple. Related to #351 |
Changes
OpenAIChatClient.Notes
This scenario intentionally reflects what Agent Framework emits natively today. It does not add manual reference spans for surfaces AF does not currently emit, such as GenAI workflow spans or inference operation detail events.