feat(engine): add HPA-style stabilization of scaling recommendations#1353
Open
ev-shindin wants to merge 6 commits into
Open
feat(engine): add HPA-style stabilization of scaling recommendations#1353ev-shindin wants to merge 6 commits into
ev-shindin wants to merge 6 commits into
Conversation
The optimizer emits a fresh per-variant replica target every cycle, which flaps when load is noisy. Today WVA leans on a downstream HPA to damp this; stabilizing in-process is the prerequisite for WVA actuating /scale directly. Add internal/stabilization, a clean-room port of the Kubernetes HPA configurable scaling behavior: a trailing stabilization window (min over the scale-up window as a floor, max over the scale-down window as a ceiling), a tolerance deadband, per-period Pods/Percent rate policies with selectPolicy, and a min/max clamp. The autoscaling/v2 behavior types are reused so operators configure damping the familiar HPA way; the algorithm is reimplemented because the upstream logic is unexported in k8s.io/kubernetes, which is not a consumable module. Wire it into the V2 engine right after the optimizer and before the enforcer, keyed namespace/variant[/role] so disaggregated P/D targets damp independently, emitting a stabilization-decision cycle log. Gated by a new enableStabilization saturation-config flag (default off), so upgrades see no behavior change; when enabled, the HPA default behavior is applied. Includes table-driven Ginkgo specs with an injectable clock and a design note under docs/plans/engine. Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>
Fixes from the multi-agent review of the previous commit: - stabilization: hold one lock for the whole Stabilize call so concurrent same-key callers observe and update per-key history atomically (was three separate critical sections with a TOCTOU on the rate budget). - stabilization: give each direction its own SelectPolicy pointer in DefaultBehavior so a caller overriding one direction by dereference cannot corrupt the other. - stabilization: add Forget(active) and call it each cycle so per-key history is bounded to the live set of variants; replace the hand-rolled pruneEvents with slices.DeleteFunc. - interfaces: add VariantDecision.ActionForTarget and use it from both the stabilizer's retargetDecision and the enforcer's updateDecisionAction, removing the duplicated action-recompute switch. - engine: use ptr.Deref for the optional replica bounds instead of a local helper. Tests: scale-down selectPolicy table (incl. Disabled), scale-from-zero tolerance guard, Forget retain/drop, engine applyStabilization enable-gate / key construction / action recompute, and EnableStabilization config-merge (including the intentional true-sticky semantics). Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>
Address the round-2 review nits (no behavior change): - stabilization: rename Forget(active) to Retain(active) — the argument is the set of keys to keep, so the name now matches the semantics. - engine: move the "0 means no floor / no cap" note from retargetDecision's doc (which never sees the bounds) to the ptr.Deref call site that does. Tests: - interfaces: direct table test for VariantDecision.ActionForTarget covering the >, <, and == (no-change) branches. - stabilization: Retain now also covers the empty-active-set eviction and the rate-event-budget reset (the prior specs seeded desired==current, so no scale event was recorded and the event-map clearing was never exercised). Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>
Headline fix from the third review round: the per-period rate limit reconstructed the period-start replica count from only the same-direction scale events (current - added for up, current + removed for down). The HPA uses both directions (current - added + removed) for each. With the old formula, an up-then-down oscillation inside one policy period drove the period-start baseline negative and pinned the next legitimate scale-up low (e.g. 2->6 then 6->2 then a spike back to 6 was capped at 2). Dormant under the wired defaults because the 300s down window suppresses the intervening down event, but a real divergence under responsive custom configs. Added a regression test reproducing the stuck-low scenario. Also: - include the model in the stabilizer key (namespace/model/variant[/role]) so two models in a namespace can never share a history bucket. - correct the defaults documentation: the magnitudes (4 pods, 100%) and the 300s down window match the HPA, but the 60s policy period is a deliberate choice for WVA's ~30s optimize cadence, not the controller's 15s default (the autoscaling/v2 API doc itself states 60s; the controller defaults to 15s). Dropped the inaccurate "mirrors upstream" wording. - clarify the recordScaleEvent comment about the post-stabilization enforcer (benign under-count, never less safe) and add a doc comment to reason(). Tests: cross-direction budgeting regression, down-direction tolerance, current-inside-band freeze, percent scale-down multi-cycle, reason() strings, and the applyStabilization pass-through (no-retarget) path. Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>
Round-4 review confirmed the cross-direction rate-budget fix correct with no regressions. Remaining items were doc/test only: - fix the stale Args.Key doc example (namespace/model/variant[/role]) and a run-on in the package doc after the spec URL. - add the symmetric scale-down cross-direction regression test (an intervening up event must not inflate the down period-start baseline) and a test for the scale-down floor-to-zero branch (a Pods policy removing more than the period-start count clamps to 0). Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>
The scaleDecision test helper always received the same namespace, variant name, and role, which golangci-lint's unparam flags. Reduce it to the two parameters that actually vary (current, target) and inline the constants. Signed-off-by: Evgeny Shindin <evgensh@il.ibm.com>
Collaborator
Author
|
/ok-to-test |
Contributor
|
🚀 Kind E2E (full) triggered by |
Contributor
|
🚀 OpenShift E2E — approve and run ( |
Contributor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
Collaborator
|
Is that really necessary? User will have to configure stabilization in two different places, is that intended? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1352
Proposed Changes
internal/stabilization: a clean-room port of the Kubernetes HPA configurable scaling behavior (trailing window — min over the up-window / max over the down-window — plus per-periodPods/Percentrate policies withselectPolicy, tolerance deadband, and min/max clamp). Reuses theautoscaling/v2behavior types so it is configured the familiar HPA way.namespace/model/variant[/role], with a structuredstabilization-decisioncycle log. Gated by a newenableStabilizationsaturation-config flag, default off. Intended deployment contract: WVA owns stabilization, HPAbehaviordelays set to0.VariantDecision.ActionForTargetand reuse it from both the enforcer and the stabilizer (removes a duplicated action-recompute switch).This is the foundation for owning scale-down so the priority-weighted rescale proposal (#1238) can actuate cleanly, and a prerequisite for direct
/scaleactuation in the 1→N range. Per-policy config via the ConfigMap, direct actuation, and the V1 path are follow-ups (see #1352).Pre-review Checklist
internal/stabilizationunit specs and the saturation engine envtest suite; E2E will accompany enabling it by default / direct actuation.docs/plans/engine/stabilization.md); no user-facing impact while the flag is off. User-facing docs will accompany the per-policy config follow-up.Release Note
Docs
No user-visible impact while default-off; user-facing documentation will accompany the per-policy configuration follow-up.