You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
axe-core has no defined performance budget. We track functional correctness exhaustively in
CI, but nothing measures, baselines, or gates bundle size or runtime cost. The pieces to
build one mostly exist already — they're just report-only and never compared against a limit.
This is an investigation, not an implementation. The deliverable is a written, reviewable performance-budget proposal produced by the assigned engineer — covering what to measure, the
reference page it's measured against, candidate thresholds, and a recommended path to enforce it.
Actually building the harness, metric export, and CI gates is follow-up work scoped out of this
issue. The sections below are starting material for that investigation, not a committed design.
The proposal should, at minimum, address: (1) a reference DOM profile — an explicit, documented
target page size that every runtime number is calculated against (the engineer chooses the number
deliberately; this issue doesn't prescribe it), (2) which performance dimensions to budget and
proposed threshold values, and (3) a recommendation for how enforcement would eventually plug into
the build/CI so regressions are caught at PR time instead of discovered by consumers.
Motivation
axe-core runs inside the consumer's page on every audit, so both payload size and execution
time are felt directly by users. The README markets axe as "fast, secure, lightweight"
(README.md:11) but attaches no numbers to that claim.
We already acknowledge a real ceiling: doc/API.md:962-992 documents that pages with >50K
elements can take 10s+, and points consumers at resultTypes and skipping color-contrast as
mitigations. That's a reactive workaround for a cost we don't currently measure or bound.
Runtime instrumentation — rich, but console-only. lib/core/utils/performance-timer.js is a full User Timing wrapper, enabled per run via performanceTimer: true (doc/API.md:436, wired in lib/core/public/run.js:36-37). It emits a
layered breakdown via window.performance.mark()/measure():
Total audit (axe) and per-frame audit (audit_start_to_end) — performance-timer.js:30-71, lib/core/public/run-rules.js:35-50
Per-rule total (rule_[ruleId]), gather (#gather), visibility filtering, matches (#matches),
and checks (runchecks_[ruleId]) — lib/core/base/rule.js:130-166, 369-404, 445-474
after processing (audit.after) and reporter — run-rules.js:62-73, run.js:42-45,67-69
Gather logging already includes node counts: gather for [ruleId] ([count] nodes): [ms]ms
(rule.js:382-387)
Works under runPartial too (run-partial.js:31-32,38-39)
Covered by test/core/utils/performance-timer.js (15+ cases).
Build pipeline — a first-party Node script. npm run build → node build/run-build.mjs (package.json:79) → build/run-build/full-build.mjs:18-71, which runs clean → validate → metadata → esbuild →
configure → babel → concat → uglify (build/run-build/concat-uglify.mjs:59, via uglify-js) →
aria-docs → locale template → postbuild → bytesize. This matters for a budget: the size-reporting
step is first-party JS we control, so adding a threshold is just editing our own build code. The
pipeline even has its own unit tests, run via npm run test:build → node --test build/run-build/*.test.mjs (package.json:124).
Bundle-size reporting — report-only.
The final build step is a homegrown runBytesize() (build/run-build/postbuild.mjs:20-32, invoked
at full-build.mjs:64-65). It iterates every locale variant × .js/.min.js, stats the file, and
just console.logs ${name}: ${bytes} bytes — no threshold, no gzip, no diff. Current sizes:
Artifact
Raw
Gzipped
axe.js (published main, package.json:56)
~1.2 MB (1,292,628 B)
—
axe.min.js
~561 KB (574,862 B)
~151 KB (154,444 B)
There's strong prior art for a hard size gate: assertEsbuildImportLimits()
(build/run-build/esbuild-core.mjs:26-40) already fails the build via assert when a module
exceeds a max import count or maxSize — but it's applied only to the gather-internals entry
({ max: 10, maxSize: 4000 }, esbuild-core.mjs:69-72), not the shipped bundle.
CI gating. .github/workflows/test.yml runs ~15 jobs; deploy.yml:26-56,125-126 makes Test-workflow success
the sole merge blocker. The build job (test.yml:55-71) already runs npm run prepare && npm run build (so bytesize runs) and uploads axe.js as an artifact (test.yml:67-70) consumed by
downstream jobs via download-artifact (test.yml:147).
Gaps a budget must fill
No machine-readable metrics.logMeasures() only does this._log('Measure ' + name + ' took ' + duration + 'ms') (performance-timer.js:111-144);
timing is not in the results object, and marks are cleared after measurement by default
(:92-104), so a PerformanceObserver can't reliably catch them. A budget needs a structured
extraction path. (net-new)
No benchmark suite / representative fixtures. No large-DOM stress pages exist
(test/assets/ is media only). No baselines, no time-series, no cross-version comparison. (net-new)
No thresholds on the bundle.runBytesize() only logs; nothing fails on regression. No
gzip/brotli sizing. (Contrast with assertEsbuildImportLimits(), which does hard-fail — but
only for the gather-internals module, not the shipped bundle.)
No CI performance gate..github/workflows/ has zero references to size/perf/budget; runBytesize() output is buried in the build logs — never a step summary, never an artifact,
never diffed base-vs-head.
No per-check granularity (checks collapse into runchecks_[ruleId]) and no
frame-collection/iframe-overhead timing.
No memory tracking — wall-clock only.
Reference DOM profile — the anchor for every runtime number
A wall-clock or per-rule budget is meaningless without a defined page to run it against: "axe takes
40ms" only means something relative to how big the DOM was. So a core part of the proposal is for
the engineer to define a canonical reference DOM — a documented target page size (and shape)
that all runtime calculations are expressed against — rather than measuring against arbitrary,
undocumented fixtures.
This issue does not prescribe the number; that determination is the engineer's to make. What it
does require is that the determination be made deliberately. The following are offered purely as example data points to feed that decision so the thinking happens — not as the answer:
Lighthouse's "Avoid an excessive DOM size" audit — a widely recognized public threshold for
DOM weight: it warns once a page exceeds ~800 nodes and scores poorly around ~1,400
nodes, with companion guidance of DOM depth ≤ 32 levels and ≤ 60 child elements under any
single parent.
axe's own documented pain point of >50K elements taking 10s+ (API.md:962-970), as the far end
of the scale.
From inputs like these, the engineer should decide and document: the target size the budget's
headline number is anchored to, any secondary/warning band, whether larger sizes are tracked as a scaling check (linear-growth, not pass/fail), and — importantly — the DOM shape, not just a
node count (realistic depth/breadth, plus variants for known-expensive cases like iframe-heavy or
color-contrast-heavy pages). Every ceiling in Dimensions C and D below is then stated per this
reference, with per-rule budgets normalized to a per-node cost (e.g. ms / 1,000 nodes) so they stay
fixture-independent.
Candidate budget dimensions to evaluate
A menu for the engineer to assess and recommend from — not a committed scope. Ordered roughly by
implementation realism, with the rough cost of eventual follow-up work flagged to inform
prioritization. Dimensions C and D would be expressed against the Reference DOM profile above.
#
Dimension
Why it matters
How we'd measure it
New work
A
Minified bundle size (axe.min.js, English)
Lightest lift — runBytesize() already stats it in the build; a hard, consumer-facing payload number today (574,862 B).
Add a threshold check to runBytesize() (postbuild.mjs:20-32) that asserts against a committed ceiling — directly mirroring assertEsbuildImportLimits() (esbuild-core.mjs:26-40), which already fails the build the same way. Failing npm run build auto-gates the CI build job.
Threshold file + assert. Small.
B
Gzipped bundle size
Real-world delivery cost; raw bytes overstate network impact.
gzip the built axe.min.js inside runBytesize(), compare to ceiling.
gzip step (no gzip/brotli sizing exists today). Small.
C
Cold-run wall-clock on the reference DOM
The metric consumers actually feel. The headline budget is the total axe time on the chosen target page; any secondary/stress bands the engineer defines are tracked too (larger ones as a scaling check against the documented >50K pain, API.md:962-970).
Build the reference fixtures, run axe.run({ performanceTimer: true }), capture the total axe measure per band. Browser harness can reuse test/wtr.config.mjs; Node path can reuse the JSDOM job.
Per-rule time ceiling (per-node, anchored to the reference)
Catches a regression in one rule (historical hot spots: color-contrast, td-has-headers) before total time blows up. Granularity already exists.
Parse rule_[ruleId] measures from a reference run; assert each stays under a per-node ceiling (e.g. ms / 1,000 nodes) normalized to the reference so it's fixture-independent. Node counts already logged (rule.js:382-387).
Catches allocation regressions invisible to wall-clock.
No existing hook.
Fully net-new. Defer — out of scope for v1.
A plausible phasing the proposal might recommend:A first (near-free, immediate value), B
alongside it, then C as the flagship runtime budget, with D layered on once a harness emits
structured metrics. E deferred.
Enforcement options to consider
Background on where enforcement could eventually live, to inform the proposal's recommendation —
implementing it is follow-up work, not part of this issue.
Single merge gate already exists.deploy.yml:26-56 treats the whole Test workflow as the
sole merge blocker, so any check added inside .github/workflows/test.yml inherits blocking status
with no branch-protection change — likely cheaper than a separate workflow.
A/B (bundle): one surfaced option is to enforce in the build itself — have runBytesize()
(postbuild.mjs:20-32) assert against committed ceilings the same way assertEsbuildImportLimits()
already does, so a breach fails npm run build (and thus the existing build job, test.yml:55-71)
with no new CI wiring. The build pipeline is unit-tested (npm run test:build), giving a second
possible home for size assertions.
C/D (runtime): likely a dedicated job that download-artifacts the already-uploaded axe.js
(test.yml:147) and runs a benchmark harness, kept separate from functional jobs so a timing flake
is diagnosable in isolation.
Base-vs-head delta detection (regression %, not just an absolute ceiling) is the highest-value
and least-obvious piece, since absolute ceilings drift as features land — worth a specific
recommendation in the proposal.
Questions the proposal should resolve
The investigation should land a recommendation on each of these (or explicitly defer it):
Absolute ceilings, relative deltas, or both? Absolute is trivial with bytesize; relative
catches creep but needs a baseline-fetch mechanism that doesn't exist yet.
Where do baselines live? A committed JSON (analogous to sri-history.json) is the path of
least resistance, but couples every legit perf change to a baseline-bump commit. Or fetch the
base branch's build at CI time?
Runtime environment for the budget. GitHub runners are noisy — wall-clock thresholds need
generous tolerance (the timer's own tests use a 17ms ANIMATION_FRAME_TOLERANCE_MS). Browser
(WTR) vs. Node/JSDOM (where color-contrast doesn't run, per README.md:83) yield materially
different numbers; pick one as the gating environment.
Structured-metrics API. Does performanceTimer gain a structured/returnable output (vs.
console-only logMeasures), or does the harness scrape window.performance.getEntriesByType('measure') before marks are cleared? This is the
prerequisite for C/D and the one change touching shipped code (performance-timer.js).
Reference DOM anchor. What target size (and shape) does the headline budget anchor to? This
is the engineer's call to make and document, using inputs like Lighthouse's ~800/~1,400-node
thresholds and axe's >50K pain point as guidance. Also decide which variants (iframe-heavy,
color-contrast-heavy) become official fixtures — these are a maintenance surface.
Failure policy. Hard-fail the PR, or warn + label override? Which dimensions should block vs.
report-only to start?
Per-rule normalization. Budget on per-node cost rather than absolute ms so Dimension D stays
fixture-independent?
Bundle scope. English-only axe.min.js, or all locale variants (runBytesize() already
loops over every langs suffix, postbuild.mjs:21-23)? Does axe.js (the published main)
also get a budget, or only the minified artifact?
Done = a budget proposal the team can act on
This issue is complete when the engineer has presented a written performance-budget proposal for
the team to review. That proposal should:
State a reference DOM target (size + shape), with the rationale and the data points
considered — the central decision this work exists to force.
Recommend which dimensions to budget and propose concrete starting threshold values (or
a clear method for deriving them).
Recommend how enforcement would eventually work — where it would plug into the build/CI and
the failure policy — at enough depth to scope the follow-up, without building it.
Resolve (or explicitly defer, with a recommendation) the open questions above.
Break the implementation into follow-up tickets (e.g. metric export, benchmark harness,
bundle-size gate, CI wiring) so the build work can be planned separately.
Out of scope for this issue
All implementation — harness, structured metric export, gzip sizing, threshold files,
base-vs-head diffing, and CI gates are follow-up work once the proposal is accepted.
Memory / heap budgeting (Dimension E).
Per-check timing granularity and iframe-overhead timing.
Everything cited above exists in-repo today and is provided as starting material for the
investigation. Everything marked net-new (benchmark harness, fixtures, structured metric export,
gzip sizing, base-vs-head diffing, threshold files) is follow-up work, not part of this issue.
Summary
axe-core has no defined performance budget. We track functional correctness exhaustively in
CI, but nothing measures, baselines, or gates bundle size or runtime cost. The pieces to
build one mostly exist already — they're just report-only and never compared against a limit.
This is an investigation, not an implementation. The deliverable is a written, reviewable
performance-budget proposal produced by the assigned engineer — covering what to measure, the
reference page it's measured against, candidate thresholds, and a recommended path to enforce it.
Actually building the harness, metric export, and CI gates is follow-up work scoped out of this
issue. The sections below are starting material for that investigation, not a committed design.
The proposal should, at minimum, address: (1) a reference DOM profile — an explicit, documented
target page size that every runtime number is calculated against (the engineer chooses the number
deliberately; this issue doesn't prescribe it), (2) which performance dimensions to budget and
proposed threshold values, and (3) a recommendation for how enforcement would eventually plug into
the build/CI so regressions are caught at PR time instead of discovered by consumers.
Motivation
time are felt directly by users. The README markets axe as "fast, secure, lightweight"
(
README.md:11) but attaches no numbers to that claim.doc/API.md:962-992documents that pages with >50Kelements can take 10s+, and points consumers at
resultTypesand skipping color-contrast asmitigations. That's a reactive workaround for a cost we don't currently measure or bound.
perf:commits since 2014 (color-contrast v4.0, selector caching perf(selector): more caching for faster selector creation #4611,runPartial timing chore: add perf timing to runPartial #5103, …).
perf:is a sanctioned commit type. We invest in performance buthave no guardrail to keep those gains from silently eroding.
Current state (what we already have)
Runtime instrumentation — rich, but console-only.
lib/core/utils/performance-timer.jsis a full User Timing wrapper, enabled per run viaperformanceTimer: true(doc/API.md:436, wired inlib/core/public/run.js:36-37). It emits alayered breakdown via
window.performance.mark()/measure():axe) and per-frame audit (audit_start_to_end) —performance-timer.js:30-71,lib/core/public/run-rules.js:35-50rule_[ruleId]), gather (#gather), visibility filtering, matches (#matches),and checks (
runchecks_[ruleId]) —lib/core/base/rule.js:130-166, 369-404, 445-474afterprocessing (audit.after) andreporter—run-rules.js:62-73,run.js:42-45,67-69gather for [ruleId] ([count] nodes): [ms]ms(
rule.js:382-387)runPartialtoo (run-partial.js:31-32,38-39)Covered by
test/core/utils/performance-timer.js(15+ cases).Build pipeline — a first-party Node script.
npm run build→node build/run-build.mjs(package.json:79) →build/run-build/full-build.mjs:18-71, which runs clean → validate → metadata → esbuild →configure → babel → concat → uglify (
build/run-build/concat-uglify.mjs:59, viauglify-js) →aria-docs → locale template → postbuild → bytesize. This matters for a budget: the size-reporting
step is first-party JS we control, so adding a threshold is just editing our own build code. The
pipeline even has its own unit tests, run via
npm run test:build→node --test build/run-build/*.test.mjs(package.json:124).Bundle-size reporting — report-only.
The final build step is a homegrown
runBytesize()(build/run-build/postbuild.mjs:20-32, invokedat
full-build.mjs:64-65). It iterates every locale variant ×.js/.min.js, stats the file, andjust
console.logs${name}: ${bytes} bytes— no threshold, no gzip, no diff. Current sizes:axe.js(publishedmain,package.json:56)axe.min.jsThere's strong prior art for a hard size gate:
assertEsbuildImportLimits()(
build/run-build/esbuild-core.mjs:26-40) already fails the build viaassertwhen a moduleexceeds a max import count or
maxSize— but it's applied only to thegather-internalsentry(
{ max: 10, maxSize: 4000 },esbuild-core.mjs:69-72), not the shipped bundle.CI gating.
.github/workflows/test.ymlruns ~15 jobs;deploy.yml:26-56,125-126makes Test-workflow successthe sole merge blocker. The
buildjob (test.yml:55-71) already runsnpm run prepare && npm run build(so bytesize runs) and uploadsaxe.jsas an artifact (test.yml:67-70) consumed bydownstream jobs via
download-artifact(test.yml:147).Gaps a budget must fill
logMeasures()only doesthis._log('Measure ' + name + ' took ' + duration + 'ms')(performance-timer.js:111-144);timing is not in the results object, and marks are cleared after measurement by default
(
:92-104), so aPerformanceObservercan't reliably catch them. A budget needs a structuredextraction path. (net-new)
(
test/assets/is media only). No baselines, no time-series, no cross-version comparison.(net-new)
runBytesize()only logs; nothing fails on regression. Nogzip/brotli sizing. (Contrast with
assertEsbuildImportLimits(), which does hard-fail — butonly for the
gather-internalsmodule, not the shipped bundle.).github/workflows/has zero references to size/perf/budget;runBytesize()output is buried in the build logs — never a step summary, never an artifact,never diffed base-vs-head.
runchecks_[ruleId]) and noframe-collection/iframe-overhead timing.
Reference DOM profile — the anchor for every runtime number
A wall-clock or per-rule budget is meaningless without a defined page to run it against: "axe takes
40ms" only means something relative to how big the DOM was. So a core part of the proposal is for
the engineer to define a canonical reference DOM — a documented target page size (and shape)
that all runtime calculations are expressed against — rather than measuring against arbitrary,
undocumented fixtures.
This issue does not prescribe the number; that determination is the engineer's to make. What it
does require is that the determination be made deliberately. The following are offered purely as
example data points to feed that decision so the thinking happens — not as the answer:
DOM weight: it warns once a page exceeds ~800 nodes and scores poorly around ~1,400
nodes, with companion guidance of DOM depth ≤ 32 levels and ≤ 60 child elements under any
single parent.
API.md:962-970), as the far endof the scale.
From inputs like these, the engineer should decide and document: the target size the budget's
headline number is anchored to, any secondary/warning band, whether larger sizes are tracked as a
scaling check (linear-growth, not pass/fail), and — importantly — the DOM shape, not just a
node count (realistic depth/breadth, plus variants for known-expensive cases like iframe-heavy or
color-contrast-heavy pages). Every ceiling in Dimensions C and D below is then stated per this
reference, with per-rule budgets normalized to a per-node cost (e.g. ms / 1,000 nodes) so they stay
fixture-independent.
Candidate budget dimensions to evaluate
A menu for the engineer to assess and recommend from — not a committed scope. Ordered roughly by
implementation realism, with the rough cost of eventual follow-up work flagged to inform
prioritization. Dimensions C and D would be expressed against the Reference DOM profile above.
axe.min.js, English)runBytesize()already stats it in the build; a hard, consumer-facing payload number today (574,862 B).runBytesize()(postbuild.mjs:20-32) thatasserts against a committed ceiling — directly mirroringassertEsbuildImportLimits()(esbuild-core.mjs:26-40), which already fails the build the same way. Failingnpm run buildauto-gates the CIbuildjob.axe.min.jsinsiderunBytesize(), compare to ceiling.axetime on the chosen target page; any secondary/stress bands the engineer defines are tracked too (larger ones as a scaling check against the documented >50K pain,API.md:962-970).axe.run({ performanceTimer: true }), capture the totalaxemeasure per band. Browser harness can reusetest/wtr.config.mjs; Node path can reuse the JSDOM job.rule_[ruleId]measures from a reference run; assert each stays under a per-node ceiling (e.g. ms / 1,000 nodes) normalized to the reference so it's fixture-independent. Node counts already logged (rule.js:382-387).A plausible phasing the proposal might recommend: A first (near-free, immediate value), B
alongside it, then C as the flagship runtime budget, with D layered on once a harness emits
structured metrics. E deferred.
Enforcement options to consider
Background on where enforcement could eventually live, to inform the proposal's recommendation —
implementing it is follow-up work, not part of this issue.
deploy.yml:26-56treats the whole Test workflow as thesole merge blocker, so any check added inside
.github/workflows/test.ymlinherits blocking statuswith no branch-protection change — likely cheaper than a separate workflow.
runBytesize()(
postbuild.mjs:20-32)assertagainst committed ceilings the same wayassertEsbuildImportLimits()already does, so a breach fails
npm run build(and thus the existingbuildjob,test.yml:55-71)with no new CI wiring. The build pipeline is unit-tested (
npm run test:build), giving a secondpossible home for size assertions.
download-artifacts the already-uploadedaxe.js(
test.yml:147) and runs a benchmark harness, kept separate from functional jobs so a timing flakeis diagnosable in isolation.
and least-obvious piece, since absolute ceilings drift as features land — worth a specific
recommendation in the proposal.
Questions the proposal should resolve
The investigation should land a recommendation on each of these (or explicitly defer it):
catches creep but needs a baseline-fetch mechanism that doesn't exist yet.
sri-history.json) is the path ofleast resistance, but couples every legit perf change to a baseline-bump commit. Or fetch the
base branch's build at CI time?
generous tolerance (the timer's own tests use a 17ms
ANIMATION_FRAME_TOLERANCE_MS). Browser(WTR) vs. Node/JSDOM (where color-contrast doesn't run, per
README.md:83) yield materiallydifferent numbers; pick one as the gating environment.
performanceTimergain a structured/returnable output (vs.console-only
logMeasures), or does the harness scrapewindow.performance.getEntriesByType('measure')before marks are cleared? This is theprerequisite for C/D and the one change touching shipped code (
performance-timer.js).is the engineer's call to make and document, using inputs like Lighthouse's ~800/~1,400-node
thresholds and axe's >50K pain point as guidance. Also decide which variants (iframe-heavy,
color-contrast-heavy) become official fixtures — these are a maintenance surface.
report-only to start?
fixture-independent?
axe.min.js, or all locale variants (runBytesize()alreadyloops over every
langssuffix,postbuild.mjs:21-23)? Doesaxe.js(the publishedmain)also get a budget, or only the minified artifact?
Done = a budget proposal the team can act on
This issue is complete when the engineer has presented a written performance-budget proposal for
the team to review. That proposal should:
considered — the central decision this work exists to force.
a clear method for deriving them).
the failure policy — at enough depth to scope the follow-up, without building it.
bundle-size gate, CI wiring) so the build work can be planned separately.
Out of scope for this issue
base-vs-head diffing, and CI gates are follow-up work once the proposal is accepted.
Everything cited above exists in-repo today and is provided as starting material for the
investigation. Everything marked net-new (benchmark harness, fixtures, structured metric export,
gzip sizing, base-vs-head diffing, threshold files) is follow-up work, not part of this issue.