feat(engine): add HPA-style stabilization of scaling recommendations by ev-shindin · Pull Request #1353 · llm-d/llm-d-workload-variant-autoscaler

ev-shindin · 2026-06-28T09:29:23Z

Proposed Changes

🎁 Add internal/stabilization: a clean-room port of the Kubernetes HPA configurable scaling behavior (trailing window — min over the up-window / max over the down-window — plus per-period Pods/Percent rate policies with selectPolicy, tolerance deadband, and min/max clamp). Reuses the autoscaling/v2 behavior types so it is configured the familiar HPA way.
🎁 Wire stabilization into the V2 engine right after the optimizer and before the enforcer, keyed per namespace/model/variant[/role], with a structured stabilization-decision cycle log. Gated by a new enableStabilization saturation-config flag, default off. Intended deployment contract: WVA owns stabilization, HPA behavior delays set to 0.
🧹 Extract VariantDecision.ActionForTarget and reuse it from both the enforcer and the stabilizer (removes a duplicated action-recompute switch).

This is the foundation for owning scale-down so the priority-weighted rescale proposal (#1238) can actuate cleanly, and a prerequisite for direct /scale actuation in the 1→N range. Per-policy config via the ConfigMap, direct actuation, and the V1 path are follow-ups (see #1352).

Pre-review Checklist

E2E tests for any new behavior — N/A while default-off (no behavior change on upgrade). Covered by internal/stabilization unit specs and the saturation engine envtest suite; E2E will accompany enabling it by default / direct actuation.
Docs PR for any user-facing impact — developer design note included (docs/plans/engine/stabilization.md); no user-facing impact while the flag is off. User-facing docs will accompany the per-policy config follow-up.
Proposal PR for any new enhancement or change to existing behavior — rationale tracked in Add HPA-style stabilization of scaling recommendations in WVA #1352 and related to the rescale proposal docs(proposals): add priority-weighted rescale proposal #1238; a dedicated proposal for the operator-facing config surface is a follow-up.

Release Note

Added opt-in HPA-style stabilization of WVA scaling recommendations (a trailing
scale-down window plus per-period rate policies), enabled via
`enableStabilization` in the saturation scaling config. Disabled by default.

Docs

No user-visible impact while default-off; user-facing documentation will accompany the per-policy configuration follow-up.

The optimizer emits a fresh per-variant replica target every cycle, which flaps when load is noisy. Today WVA leans on a downstream HPA to damp this; stabilizing in-process is the prerequisite for WVA actuating /scale directly. Add internal/stabilization, a clean-room port of the Kubernetes HPA configurable scaling behavior: a trailing stabilization window (min over the scale-up window as a floor, max over the scale-down window as a ceiling), a tolerance deadband, per-period Pods/Percent rate policies with selectPolicy, and a min/max clamp. The autoscaling/v2 behavior types are reused so operators configure damping the familiar HPA way; the algorithm is reimplemented because the upstream logic is unexported in k8s.io/kubernetes, which is not a consumable module. Wire it into the V2 engine right after the optimizer and before the enforcer, keyed namespace/variant[/role] so disaggregated P/D targets damp independently, emitting a stabilization-decision cycle log. Gated by a new enableStabilization saturation-config flag (default off), so upgrades see no behavior change; when enabled, the HPA default behavior is applied. Includes table-driven Ginkgo specs with an injectable clock and a design note under docs/plans/engine. Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>

Fixes from the multi-agent review of the previous commit: - stabilization: hold one lock for the whole Stabilize call so concurrent same-key callers observe and update per-key history atomically (was three separate critical sections with a TOCTOU on the rate budget). - stabilization: give each direction its own SelectPolicy pointer in DefaultBehavior so a caller overriding one direction by dereference cannot corrupt the other. - stabilization: add Forget(active) and call it each cycle so per-key history is bounded to the live set of variants; replace the hand-rolled pruneEvents with slices.DeleteFunc. - interfaces: add VariantDecision.ActionForTarget and use it from both the stabilizer's retargetDecision and the enforcer's updateDecisionAction, removing the duplicated action-recompute switch. - engine: use ptr.Deref for the optional replica bounds instead of a local helper. Tests: scale-down selectPolicy table (incl. Disabled), scale-from-zero tolerance guard, Forget retain/drop, engine applyStabilization enable-gate / key construction / action recompute, and EnableStabilization config-merge (including the intentional true-sticky semantics). Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>

Address the round-2 review nits (no behavior change): - stabilization: rename Forget(active) to Retain(active) — the argument is the set of keys to keep, so the name now matches the semantics. - engine: move the "0 means no floor / no cap" note from retargetDecision's doc (which never sees the bounds) to the ptr.Deref call site that does. Tests: - interfaces: direct table test for VariantDecision.ActionForTarget covering the >, <, and == (no-change) branches. - stabilization: Retain now also covers the empty-active-set eviction and the rate-event-budget reset (the prior specs seeded desired==current, so no scale event was recorded and the event-map clearing was never exercised). Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>

Headline fix from the third review round: the per-period rate limit reconstructed the period-start replica count from only the same-direction scale events (current - added for up, current + removed for down). The HPA uses both directions (current - added + removed) for each. With the old formula, an up-then-down oscillation inside one policy period drove the period-start baseline negative and pinned the next legitimate scale-up low (e.g. 2->6 then 6->2 then a spike back to 6 was capped at 2). Dormant under the wired defaults because the 300s down window suppresses the intervening down event, but a real divergence under responsive custom configs. Added a regression test reproducing the stuck-low scenario. Also: - include the model in the stabilizer key (namespace/model/variant[/role]) so two models in a namespace can never share a history bucket. - correct the defaults documentation: the magnitudes (4 pods, 100%) and the 300s down window match the HPA, but the 60s policy period is a deliberate choice for WVA's ~30s optimize cadence, not the controller's 15s default (the autoscaling/v2 API doc itself states 60s; the controller defaults to 15s). Dropped the inaccurate "mirrors upstream" wording. - clarify the recordScaleEvent comment about the post-stabilization enforcer (benign under-count, never less safe) and add a doc comment to reason(). Tests: cross-direction budgeting regression, down-direction tolerance, current-inside-band freeze, percent scale-down multi-cycle, reason() strings, and the applyStabilization pass-through (no-retarget) path. Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>

Round-4 review confirmed the cross-direction rate-budget fix correct with no regressions. Remaining items were doc/test only: - fix the stale Args.Key doc example (namespace/model/variant[/role]) and a run-on in the package doc after the spec URL. - add the symmetric scale-down cross-direction regression test (an intervening up event must not inflate the down period-start baseline) and a test for the scale-down floor-to-zero branch (a Pods policy removing more than the period-start count clamps to 0). Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>

The scaleDecision test helper always received the same namespace, variant name, and role, which golangci-lint's unparam flags. Reduce it to the two parameters that actually vary (current, target) and inline the constants. Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>

ev-shindin · 2026-06-28T11:37:28Z

/ok-to-test

github-actions · 2026-06-28T11:37:36Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-06-28T11:37:42Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-06-28T11:40:37Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	29	21

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

lionelvillard · 2026-07-02T13:15:16Z

Is that really necessary? User will have to configure stabilization in two different places, is that intended?

ev-shindin added 5 commits June 28, 2026 10:38

ev-shindin added this to the v0.9.0 release milestone Jun 28, 2026

ev-shindin self-assigned this Jun 28, 2026

ev-shindin added the area/engine label Jun 28, 2026

ev-shindin requested a review from biranofer June 28, 2026 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(engine): add HPA-style stabilization of scaling recommendations#1353

feat(engine): add HPA-style stabilization of scaling recommendations#1353
ev-shindin wants to merge 6 commits into
llm-d:mainfrom
ev-shindin:feat/scaling-stabilization

ev-shindin commented Jun 28, 2026

Uh oh!

ev-shindin commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

lionelvillard commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ev-shindin commented Jun 28, 2026

Proposed Changes

Pre-review Checklist

Uh oh!

ev-shindin commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

GPU Pre-flight Check ✅

Uh oh!

lionelvillard commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants