Investigate and propose a performance budget for axe-core

## Summary

axe-core has no defined performance budget. We track functional correctness exhaustively in
CI, but nothing measures, baselines, or gates **bundle size** or **runtime cost**. The pieces to
build one mostly exist already — they're just report-only and never compared against a limit.

**This is an investigation, not an implementation.** The deliverable is a written, reviewable
**performance-budget proposal** produced by the assigned engineer — covering what to measure, the
reference page it's measured against, candidate thresholds, and a recommended path to enforce it.
Actually building the harness, metric export, and CI gates is **follow-up work** scoped out of this
issue. The sections below are starting material for that investigation, not a committed design.

The proposal should, at minimum, address: (1) a **reference DOM profile** — an explicit, documented
target page size that every runtime number is calculated against (the engineer chooses the number
deliberately; this issue doesn't prescribe it), (2) which performance dimensions to budget and
proposed threshold values, and (3) a recommendation for how enforcement would eventually plug into
the build/CI so regressions are caught at PR time instead of discovered by consumers.

## Motivation

- axe-core runs **inside the consumer's page** on every audit, so both payload size and execution
  time are felt directly by users. The README markets axe as "fast, secure, lightweight"
  (`README.md:11`) but attaches no numbers to that claim.
- We already acknowledge a real ceiling: `doc/API.md:962-992` documents that pages with >50K
  elements can take 10s+, and points consumers at `resultTypes` and skipping color-contrast as
  mitigations. That's a reactive workaround for a cost we don't currently measure or bound.
- There have been 50+ `perf:` commits since 2014 (color-contrast v4.0, selector caching #4611,
  runPartial timing #5103, …). `perf:` is a sanctioned commit type. We invest in performance but
  have no guardrail to keep those gains from silently eroding.

## Current state (what we already have)

**Runtime instrumentation — rich, but console-only.**
`lib/core/utils/performance-timer.js` is a full User Timing wrapper, enabled per run via
`performanceTimer: true` (`doc/API.md:436`, wired in `lib/core/public/run.js:36-37`). It emits a
layered breakdown via `window.performance.mark()`/`measure()`:

- Total audit (`axe`) and per-frame audit (`audit_start_to_end`) — `performance-timer.js:30-71`,
  `lib/core/public/run-rules.js:35-50`
- Per-rule total (`rule_[ruleId]`), gather (`#gather`), visibility filtering, matches (`#matches`),
  and checks (`runchecks_[ruleId]`) — `lib/core/base/rule.js:130-166, 369-404, 445-474`
- `after` processing (`audit.after`) and `reporter` — `run-rules.js:62-73`, `run.js:42-45,67-69`
- Gather logging already includes node counts: `gather for [ruleId] ([count] nodes): [ms]ms`
  (`rule.js:382-387`)
- Works under `runPartial` too (`run-partial.js:31-32,38-39`)

Covered by `test/core/utils/performance-timer.js` (15+ cases).

**Build pipeline — a first-party Node script.**
`npm run build` → `node build/run-build.mjs` (`package.json:79`) →
`build/run-build/full-build.mjs:18-71`, which runs clean → validate → metadata → esbuild →
configure → babel → concat → uglify (`build/run-build/concat-uglify.mjs:59`, via `uglify-js`) →
aria-docs → locale template → postbuild → bytesize. This matters for a budget: the size-reporting
step is **first-party JS we control**, so adding a threshold is just editing our own build code. The
pipeline even has its own unit tests, run via `npm run test:build` →
`node --test build/run-build/*.test.mjs` (`package.json:124`).

**Bundle-size reporting — report-only.**
The final build step is a homegrown `runBytesize()` (`build/run-build/postbuild.mjs:20-32`, invoked
at `full-build.mjs:64-65`). It iterates every locale variant × `.js`/`.min.js`, stats the file, and
just `console.log`s `${name}: ${bytes} bytes` — no threshold, no gzip, no diff. Current sizes:

| Artifact | Raw | Gzipped |
|---|---|---|
| `axe.js` (published `main`, `package.json:56`) | ~1.2 MB (1,292,628 B) | — |
| `axe.min.js` | ~561 KB (574,862 B) | ~151 KB (154,444 B) |

There's strong prior art for a hard size gate: `assertEsbuildImportLimits()`
(`build/run-build/esbuild-core.mjs:26-40`) already **fails the build** via `assert` when a module
exceeds a max import count or `maxSize` — but it's applied only to the `gather-internals` entry
(`{ max: 10, maxSize: 4000 }`, `esbuild-core.mjs:69-72`), not the shipped bundle.

**CI gating.**
`.github/workflows/test.yml` runs ~15 jobs; `deploy.yml:26-56,125-126` makes Test-workflow success
the **sole** merge blocker. The `build` job (`test.yml:55-71`) already runs `npm run prepare &&
npm run build` (so bytesize runs) and uploads `axe.js` as an artifact (`test.yml:67-70`) consumed by
downstream jobs via `download-artifact` (`test.yml:147`).

## Gaps a budget must fill

1. **No machine-readable metrics.** `logMeasures()` only does
   `this._log('Measure ' + name + ' took ' + duration + 'ms')` (`performance-timer.js:111-144`);
   timing is **not** in the results object, and marks are cleared after measurement by default
   (`:92-104`), so a `PerformanceObserver` can't reliably catch them. A budget needs a structured
   extraction path. *(net-new)*
2. **No benchmark suite / representative fixtures.** No large-DOM stress pages exist
   (`test/assets/` is media only). No baselines, no time-series, no cross-version comparison.
   *(net-new)*
3. **No thresholds on the bundle.** `runBytesize()` only logs; nothing fails on regression. No
   gzip/brotli sizing. (Contrast with `assertEsbuildImportLimits()`, which *does* hard-fail — but
   only for the `gather-internals` module, not the shipped bundle.)
4. **No CI performance gate.** `.github/workflows/` has zero references to size/perf/budget;
   `runBytesize()` output is buried in the build logs — never a step summary, never an artifact,
   never diffed base-vs-head.
5. **No per-check granularity** (checks collapse into `runchecks_[ruleId]`) and **no
   frame-collection/iframe-overhead timing**.
6. **No memory tracking** — wall-clock only.

## Reference DOM profile — the anchor for every runtime number

A wall-clock or per-rule budget is meaningless without a defined page to run it against: "axe takes
40ms" only means something relative to *how big the DOM was*. So a core part of the proposal is for
the engineer to **define a canonical reference DOM** — a documented target page size (and shape)
that all runtime calculations are expressed against — rather than measuring against arbitrary,
undocumented fixtures.

This issue does **not** prescribe the number; that determination is the engineer's to make. What it
does require is that the determination be made deliberately. The following are offered purely as
**example data points** to feed that decision so the thinking happens — not as the answer:

- **Lighthouse's "Avoid an excessive DOM size" audit** — a widely recognized public threshold for
  DOM weight: it *warns* once a page exceeds **~800 nodes** and *scores poorly* around **~1,400
  nodes**, with companion guidance of DOM depth **≤ 32 levels** and **≤ 60 child elements** under any
  single parent.
- axe's own documented pain point of **>50K elements** taking 10s+ (`API.md:962-970`), as the far end
  of the scale.

From inputs like these, the engineer should decide and document: the **target** size the budget's
headline number is anchored to, any **secondary/warning** band, whether larger sizes are tracked as a
**scaling** check (linear-growth, not pass/fail), and — importantly — the DOM **shape**, not just a
node count (realistic depth/breadth, plus variants for known-expensive cases like iframe-heavy or
color-contrast-heavy pages). Every ceiling in Dimensions C and D below is then stated *per this
reference*, with per-rule budgets normalized to a per-node cost (e.g. ms / 1,000 nodes) so they stay
fixture-independent.

## Candidate budget dimensions to evaluate

A menu for the engineer to assess and recommend from — not a committed scope. Ordered roughly by
implementation realism, with the rough cost of eventual follow-up work flagged to inform
prioritization. Dimensions C and D would be expressed against the **Reference DOM profile** above.

| # | Dimension | Why it matters | How we'd measure it | New work |
|---|---|---|---|---|
| **A** | Minified bundle size (`axe.min.js`, English) | Lightest lift — `runBytesize()` already stats it in the build; a hard, consumer-facing payload number today (574,862 B). | Add a threshold check to `runBytesize()` (`postbuild.mjs:20-32`) that `assert`s against a committed ceiling — directly mirroring `assertEsbuildImportLimits()` (`esbuild-core.mjs:26-40`), which already fails the build the same way. Failing `npm run build` auto-gates the CI `build` job. | Threshold file + assert. **Small.** |
| **B** | Gzipped bundle size | Real-world delivery cost; raw bytes overstate network impact. | gzip the built `axe.min.js` inside `runBytesize()`, compare to ceiling. | gzip step (no gzip/brotli sizing exists today). **Small.** |
| **C** | Cold-run wall-clock on the reference DOM | The metric consumers actually feel. The headline budget is the total `axe` time on the chosen target page; any secondary/stress bands the engineer defines are tracked too (larger ones as a scaling check against the documented >50K pain, `API.md:962-970`). | Build the reference fixtures, run `axe.run({ performanceTimer: true })`, capture the total `axe` measure per band. Browser harness can reuse `test/wtr.config.mjs`; Node path can reuse the JSDOM job. | Fixtures + runnable harness + structured metric export. **Medium–large** — the flagship investment. |
| **D** | Per-rule time ceiling (per-node, anchored to the reference) | Catches a regression in *one* rule (historical hot spots: color-contrast, td-has-headers) before total time blows up. Granularity already exists. | Parse `rule_[ruleId]` measures from a reference run; assert each stays under a per-node ceiling (e.g. ms / 1,000 nodes) normalized to the reference so it's fixture-independent. Node counts already logged (`rule.js:382-387`). | Structured export (Gap 1) + per-rule baseline table. **Medium**; depends on C's harness. |
| **E** | Memory / peak heap | Catches allocation regressions invisible to wall-clock. | No existing hook. | Fully net-new. **Defer — out of scope for v1.** |

**A plausible phasing the proposal might recommend:** **A** first (near-free, immediate value), **B**
alongside it, then **C** as the flagship runtime budget, with **D** layered on once a harness emits
structured metrics. **E** deferred.

## Enforcement options to consider

Background on *where* enforcement could eventually live, to inform the proposal's recommendation —
implementing it is follow-up work, not part of this issue.

- **Single merge gate already exists.** `deploy.yml:26-56` treats the whole Test workflow as the
  sole merge blocker, so any check added inside `.github/workflows/test.yml` inherits blocking status
  with no branch-protection change — likely cheaper than a separate workflow.
- **A/B (bundle):** one surfaced option is to enforce *in the build itself* — have `runBytesize()`
  (`postbuild.mjs:20-32`) `assert` against committed ceilings the same way `assertEsbuildImportLimits()`
  already does, so a breach fails `npm run build` (and thus the existing `build` job, `test.yml:55-71`)
  with no new CI wiring. The build pipeline is unit-tested (`npm run test:build`), giving a second
  possible home for size assertions.
- **C/D (runtime):** likely a dedicated job that `download-artifact`s the already-uploaded `axe.js`
  (`test.yml:147`) and runs a benchmark harness, kept separate from functional jobs so a timing flake
  is diagnosable in isolation.
- **Base-vs-head delta detection** (regression %, not just an absolute ceiling) is the highest-value
  and least-obvious piece, since absolute ceilings drift as features land — worth a specific
  recommendation in the proposal.

## Questions the proposal should resolve

The investigation should land a recommendation on each of these (or explicitly defer it):

1. **Absolute ceilings, relative deltas, or both?** Absolute is trivial with bytesize; relative
   catches creep but needs a baseline-fetch mechanism that doesn't exist yet.
2. **Where do baselines live?** A committed JSON (analogous to `sri-history.json`) is the path of
   least resistance, but couples every legit perf change to a baseline-bump commit. Or fetch the
   base branch's build at CI time?
3. **Runtime environment for the budget.** GitHub runners are noisy — wall-clock thresholds need
   generous tolerance (the timer's own tests use a 17ms `ANIMATION_FRAME_TOLERANCE_MS`). Browser
   (WTR) vs. Node/JSDOM (where color-contrast doesn't run, per `README.md:83`) yield materially
   different numbers; pick one as the gating environment.
4. **Structured-metrics API.** Does `performanceTimer` gain a structured/returnable output (vs.
   console-only `logMeasures`), or does the harness scrape
   `window.performance.getEntriesByType('measure')` before marks are cleared? This is the
   prerequisite for C/D and the one change touching shipped code (`performance-timer.js`).
5. **Reference DOM anchor.** What target size (and shape) does the headline budget anchor to? This
   is the engineer's call to make and document, using inputs like Lighthouse's ~800/~1,400-node
   thresholds and axe's >50K pain point as guidance. Also decide which variants (iframe-heavy,
   color-contrast-heavy) become official fixtures — these are a maintenance surface.
6. **Failure policy.** Hard-fail the PR, or warn + label override? Which dimensions should block vs.
   report-only to start?
7. **Per-rule normalization.** Budget on per-node cost rather than absolute ms so Dimension D stays
   fixture-independent?
8. **Bundle scope.** English-only `axe.min.js`, or all locale variants (`runBytesize()` already
   loops over every `langs` suffix, `postbuild.mjs:21-23`)? Does `axe.js` (the published `main`)
   also get a budget, or only the minified artifact?

## Done = a budget proposal the team can act on

This issue is complete when the engineer has **presented a written performance-budget proposal** for
the team to review. That proposal should:

- [ ] State a **reference DOM target** (size + shape), with the rationale and the data points
      considered — the central decision this work exists to force.
- [ ] Recommend **which dimensions to budget** and propose concrete starting **threshold values** (or
      a clear method for deriving them).
- [ ] Recommend **how enforcement would eventually work** — where it would plug into the build/CI and
      the failure policy — at enough depth to scope the follow-up, without building it.
- [ ] Resolve (or explicitly defer, with a recommendation) the open questions above.
- [ ] Break the implementation into **follow-up tickets** (e.g. metric export, benchmark harness,
      bundle-size gate, CI wiring) so the build work can be planned separately.

## Out of scope for this issue

- **All implementation** — harness, structured metric export, gzip sizing, threshold files,
  base-vs-head diffing, and CI gates are follow-up work once the proposal is accepted.
- Memory / heap budgeting (Dimension E).
- Per-check timing granularity and iframe-overhead timing.
- External monitoring dashboards (SpeedCurve, Datadog, bundle-analyzer, etc.).

---

*Everything cited above exists in-repo today and is provided as starting material for the
investigation. Everything marked **net-new** (benchmark harness, fixtures, structured metric export,
gzip sizing, base-vs-head diffing, threshold files) is follow-up work, not part of this issue.*


#	Dimension	Why it matters	How we'd measure it	New work
A	Minified bundle size (`axe.min.js`, English)	Lightest lift — `runBytesize()` already stats it in the build; a hard, consumer-facing payload number today (574,862 B).	Add a threshold check to `runBytesize()` (`postbuild.mjs:20-32`) that `assert`s against a committed ceiling — directly mirroring `assertEsbuildImportLimits()` (`esbuild-core.mjs:26-40`), which already fails the build the same way. Failing `npm run build` auto-gates the CI `build` job.	Threshold file + assert. Small.
B	Gzipped bundle size	Real-world delivery cost; raw bytes overstate network impact.	gzip the built `axe.min.js` inside `runBytesize()`, compare to ceiling.	gzip step (no gzip/brotli sizing exists today). Small.
C	Cold-run wall-clock on the reference DOM	The metric consumers actually feel. The headline budget is the total `axe` time on the chosen target page; any secondary/stress bands the engineer defines are tracked too (larger ones as a scaling check against the documented >50K pain, `API.md:962-970`).	Build the reference fixtures, run `axe.run({ performanceTimer: true })`, capture the total `axe` measure per band. Browser harness can reuse `test/wtr.config.mjs`; Node path can reuse the JSDOM job.	Fixtures + runnable harness + structured metric export. Medium–large — the flagship investment.
D	Per-rule time ceiling (per-node, anchored to the reference)	Catches a regression in one rule (historical hot spots: color-contrast, td-has-headers) before total time blows up. Granularity already exists.	Parse `rule_[ruleId]` measures from a reference run; assert each stays under a per-node ceiling (e.g. ms / 1,000 nodes) normalized to the reference so it's fixture-independent. Node counts already logged (`rule.js:382-387`).	Structured export (Gap 1) + per-rule baseline table. Medium; depends on C's harness.
E	Memory / peak heap	Catches allocation regressions invisible to wall-clock.	No existing hook.	Fully net-new. Defer — out of scope for v1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate and propose a performance budget for axe-core #5176

Summary

Motivation

Current state (what we already have)

Gaps a budget must fill

Reference DOM profile — the anchor for every runtime number

Candidate budget dimensions to evaluate

Enforcement options to consider

Questions the proposal should resolve

Done = a budget proposal the team can act on

Out of scope for this issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Artifact	Raw	Gzipped
`axe.js` (published `main`, `package.json:56`)	~1.2 MB (1,292,628 B)	—
`axe.min.js`	~561 KB (574,862 B)	~151 KB (154,444 B)

Uh oh!

Investigate and propose a performance budget for axe-core #5176

Description

Summary

Motivation

Current state (what we already have)

Gaps a budget must fill

Reference DOM profile — the anchor for every runtime number

Candidate budget dimensions to evaluate

Enforcement options to consider

Questions the proposal should resolve

Done = a budget proposal the team can act on

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions