UN-2771 [FEAT] Report text extraction time in API deployment metrics#2032
UN-2771 [FEAT] Report text extraction time in API deployment metrics#2032athul-rs wants to merge 2 commits into
Conversation
The structure tool timed indexing but not the text extraction (LLMWhisperer/X2Text) call, so API responses with include_metrics=True reported indexing time only. Time dynamic_extraction the same way and merge it into the result metrics as extraction.time_taken(s). Bump structure tool to 0.0.102. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
WalkthroughThe structure pipeline executor now tracks extraction timing by initializing and recording metrics during extraction execution, capturing the start time only when extraction is not skipped, measuring the duration after successful completion, and passing the recorded metrics to the pipeline finalization method to be merged with existing metrics. ChangesExtraction timing metrics in pipeline execution
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| workers/executor/executors/legacy_executor.py | Adds extraction-timing metrics alongside existing indexing metrics; logic is correct for all pipeline paths (skip-extraction, summarization, single-pass, normal) |
Sequence Diagram
sequenceDiagram
participant Caller
participant LegacyExecutor
participant HandleExtract
participant HandleAnswer
Caller->>LegacyExecutor: _run_pipeline()
Note over LegacyExecutor: extraction_metrics = {}
alt not skip_extraction
Note over LegacyExecutor: extraction_start = time.monotonic()
LegacyExecutor->>HandleExtract: _handle_extract(extract_ctx)
HandleExtract-->>LegacyExecutor: extract_result
Note over LegacyExecutor: extraction_metrics = {"extraction": {"time_taken(s)": elapsed}}
end
LegacyExecutor->>HandleAnswer: _run_pipeline_answer_step()
HandleAnswer-->>LegacyExecutor: answer_result
LegacyExecutor->>LegacyExecutor: _finalize_pipeline_result(index_metrics, extraction_metrics)
Note over LegacyExecutor: new_metrics = merge(index_metrics, extraction_metrics)
Note over LegacyExecutor: structured_output[metrics] = merge(existing, new_metrics)
LegacyExecutor-->>Caller: ExecutionResult with metrics
Reviews (2): Last reviewed commit: "UN-2771 Rework: capture extraction time ..." | Re-trigger Greptile
chandrasekharan-zipstack
left a comment
There was a problem hiding this comment.
@athul-rs I like your intent here however I don't think the tool related code or logic is being used anymore. This is deprecated and in fact we'll be removing this code soon. Can you check if this is handled in the equivalent flow involving the celery tasks and ensure this concern is addressed?
cc: @harini-venkataraman
Per review, the structure tool's Docker path is deprecated — the live
flow is the celery-based LegacyExecutor structure pipeline. Time the
extract step there and merge {'extraction': {'time_taken(s)': ...}}
into the result metrics alongside the existing per-output indexing
timing. Structure tool changes reverted (no tool version bump needed).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@chandrasekharan-zipstack you were right — reworked. The |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@workers/executor/executors/legacy_executor.py`:
- Around line 645-647: The current extraction_metrics dict uses a top-level key
"extraction" which can collide with user-defined output/prompt names when merged
later; update this to use a reserved pipeline namespace (e.g., set
extraction_metrics = {"_pipeline": {"extraction": {"time_taken(s)": ...}}}) and
then merge into the main metrics map so pipeline-level metrics live under
metrics["_pipeline"]; adjust the merge logic where metrics and
extraction_metrics are combined (the existing merge around the metrics variable
at lines ~804-811) to preserve the "_pipeline" namespace, or alternatively add a
pre-merge check to disallow user outputs named "extraction" if you prefer the
blocking approach. Ensure you update references to extraction_metrics and the
merge operation accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d5ab6512-472a-43ec-a802-245f014d585e
📒 Files selected for processing (1)
workers/executor/executors/legacy_executor.py
| extraction_metrics = { | ||
| "extraction": {"time_taken(s)": time.monotonic() - extraction_start} | ||
| } |
There was a problem hiding this comment.
Avoid top-level metrics key collision with user prompt names
"extraction" is currently injected as a normal top-level metrics key (Line [645]), but top-level metrics keys are also used for output/prompt names. If a user defines an output named "extraction", the merge at Line [804] will silently co-mingle unrelated metrics under the same key.
Use a reserved namespace for pipeline-level metrics (e.g., metrics["_pipeline"]["extraction"]) or enforce/reserve "extraction" as a disallowed output name before merge.
Also applies to: 804-811
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@workers/executor/executors/legacy_executor.py` around lines 645 - 647, The
current extraction_metrics dict uses a top-level key "extraction" which can
collide with user-defined output/prompt names when merged later; update this to
use a reserved pipeline namespace (e.g., set extraction_metrics = {"_pipeline":
{"extraction": {"time_taken(s)": ...}}}) and then merge into the main metrics
map so pipeline-level metrics live under metrics["_pipeline"]; adjust the merge
logic where metrics and extraction_metrics are combined (the existing merge
around the metrics variable at lines ~804-811) to preserve the "_pipeline"
namespace, or alternatively add a pre-merge check to disallow user outputs named
"extraction" if you prefer the blocking approach. Ensure you update references
to extraction_metrics and the merge operation accordingly.
Unstract test resultsPer-group results
Critical paths
|
jaseemjaskp
left a comment
There was a problem hiding this comment.
Automated review (PR Review Toolkit) of the extraction-time metric.
Net assessment: the merge logic is correct, empty-dict-safe, and introduces no silent failures — error propagation (LegacyExecutorError re-raise) is byte-for-byte unchanged. No blocking bugs. The substantive open item is the top-level-key collision already flagged by @coderabbitai on L647; the inline notes below are distinct consistency/maintainability/test points.
Biggest gap — test coverage (not inline-able; test_phase5d.py isn't in this diff): the PR adds zero tests for the new behavior. _finalize_pipeline_result / _merge_pipeline_metrics are exercised by test_phase5d.py (test_index_metrics_merged L261, test_merge_* L915-927) but none assert on extraction_metrics. Recommend adding before merge:
test_extraction_metrics_recorded— normal run:result.data["metrics"]["extraction"]["time_taken(s)"]is afloat >= 0and coexists with per-prompt keys.test_extract_failure_records_no_extraction_metric— metric is computed after the_failureearly-return, so a failed extract must record no timing (pin via a_finalize_pipeline_resultspy /assert_not_called).test_index_and_extraction_metrics_merged— both families merge (top-levelextraction+ per-outputindexing).test_skip_extraction_records_no_extraction_metric—skip_extraction⇒ no"extraction"key.
| _absorb(extract_result) | ||
| extracted_text = extract_result.data.get(IKeys.EXTRACTED_TEXT, "") | ||
| extraction_metrics = { | ||
| "extraction": {"time_taken(s)": time.monotonic() - extraction_start} |
There was a problem hiding this comment.
Maintainability (distinct from the key-collision note on L647): this hardcodes the literal "extraction", but PromptServiceConstants.EXTRACTION = "extraction" already exists and is imported as PSKeys in this file — use PSKeys.EXTRACTION so the metric key and the constant can't drift. Separately, "time_taken(s)" is duplicated here and at L1031 (index_metrics); consider promoting it to a shared constant.
Minor readability: for parity with the indexing path (elapsed = ...; index_metrics[output_name] = {"indexing": {"time_taken(s)": elapsed}}, L1029-1031), capture the duration in a named local first:
extraction_time = time.monotonic() - extraction_start
extraction_metrics = {PSKeys.EXTRACTION: {"time_taken(s)": extraction_time}}| # ---- Step 1: Extract ---- | ||
| if not skip_extraction: | ||
| step += 1 | ||
| extraction_start = time.monotonic() |
There was a problem hiding this comment.
Clock-source consistency: extraction times the step with time.monotonic(), while the sibling indexing metric uses wall-clock datetime.datetime.now() (L1029). time.monotonic() is the correct choice for duration measurement (immune to NTP / system-clock adjustments), so this code is right — but the two producers of "time_taken(s)" now use different clocks. Worth a follow-up to align the indexing path onto monotonic() too.
| source_file_name: str, | ||
| extracted_text: str, | ||
| index_metrics: dict, | ||
| extraction_metrics: dict | None = None, |
There was a problem hiding this comment.
The new extraction_metrics param isn't reflected in the docstring. Its shape differs from index_metrics in a load-bearing way: extraction_metrics is a top-level {"extraction": {"time_taken(s)": float}} (a sibling of per-output keys), whereas index_metrics is keyed per output {output_name: {"indexing": {...}}}. A short Args: note documenting both shapes (and that this is None/empty when extraction is skipped) would stop a future maintainer from incorrectly nesting/merging them. Consider typing it as dict[str, dict] | None or a small TypedDict rather than bare dict.



What
include_metrics=Truenow report the text extraction (LLMWhisperer/X2Text) duration asmetrics.extraction["time_taken(s)"], alongside the existing per-output indexing time.Why
UN-2771 — metrics only included indexing time; the LLMWhisperer call (often the dominant cost for large documents) was invisible, making it hard to attribute slow executions.
How
Reworked per review (@chandrasekharan-zipstack): the original version added timing to
tools/structure(deprecated Docker path); that's reverted. The change now lives in the live celery flow:LegacyExecutor._handle_structure_pipelinetimes the extract step (time.monotonic()) and_finalize_pipeline_resultmerges{"extraction": {"time_taken(s)": ...}}into result metrics via the existing_merge_pipeline_metrics, exactly mirroring how_index_pipeline_outputrecords indexing time. The Prompt Studio IDE flow (_handle_ide_index) is untouched.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
indexingmetrics and response shape unchanged. Paths that skip extraction (smart-table) leave the dict empty — nothing added, same as today.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Notes on Testing
include_metrics=True→ response metrics containextraction.time_taken(s); smart-table path unchanged.Screenshots
N/A
Checklist
I have read and understood the Contribution Guidelines.
🤖 Generated with Claude Code