Skip to content

PDF detection — Phase 3: remaining four workers (commenting, assessing, linking, tagging) #737

Description

@adampingel

Part of #351. Fan the PDF branch out to commenting, assessing, linking, tagging. The geometry tail (buildPdfAnnotation) is shared; each worker's body is already built by its existing processXJob. Shippable one worker at a time.

Per-worker notes

  • commenting / assessing — generated-text body; geometry identical to highlighting; lowest risk.
  • linking — entity-reference body; occurrence disambiguation matters most here (a wrong occurrence draws a rect on the wrong page); the first-of-many / prefix-suffix case carries more weight.
  • tagging — structural; v1 uses the same word-union geometry; block-level bbox is a later refinement, not v1.

Tests (gate this phase)

  • Unit/integration: one suite per motivation (mirroring the existing parser tests) — selector geometry + body shape + dedupe.
  • e2e: one spec per motivation (or a representative subset — a comment + a linking reference, the two body shapes) on the seeded PDF.
  • Live: run each motivation on a real text PDF; confirm geometry + body quality.

Done

All five motivations detect-and-render on text PDFs; per-motivation tests green; live spot-checks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions