fix(optimizer): prefer cheaper variant when it covers overflow at lower absolute cost#1252
Open
biranofer wants to merge 1 commit into
Open
fix(optimizer): prefer cheaper variant when it covers overflow at lower absolute cost#1252biranofer wants to merge 1 commit into
biranofer wants to merge 1 commit into
Conversation
…er absolute cost costAwareScaleUp greedily allocated replicas to the most cost-efficient variant (lowest cost/perReplicaCapacity). This missed cases where a cheaper-by-absolute-cost variant can cover the required capacity overflow at lower total spend — e.g. 1 replica at cost 10 vs 1 replica at cost 5 for the same demand. Add a per-iteration check: before committing to the Nth replica of the current variant, compare its cost against the cost of covering the overflow with the cheapest-by-absolute-cost variant. If the fallback is strictly cheaper AND has remaining headroom (below its maxReplicas cap), round down and let the loop assign remaining demand to the cheaper variant. The N-1 terms cancel, so the condition reduces to: fallbackReplicas * cheapest.Cost < vc.Cost On equality the more efficient variant is kept (strict <), which is correct since at equal cost it provides more capacity per dollar. Fixes: llm-d#1251 Signed-off-by: Biran <biran@il.ibm.com> Signed-off-by: biran <biranofer@gmail.com>
deanlorenz
reviewed
Jun 9, 2026
Collaborator
Author
|
I'm considering to add a CS (Cost Save) configuration parameter [0..1] that prefer the cheapest only if it saves CS of the cost, i,e. A reasonable default can be 0.2 |
deanlorenz
reviewed
Jun 10, 2026
deanlorenz
left a comment
Collaborator
There was a problem hiding this comment.
- please add unit tests.
- please resolve conflicts after rebase to main.
NIT
- cheapestVariantCapacity does not break ties. order may change. may be harder to debug and test.
- if current is capped then cheapest must cover all remaining -- unlikely to be cheaper than one current replica.
Collaborator
|
@biranofer please rebase and address @deanlorenz review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
costAwareScaleUpgreedily picked the most cost-efficient variant (lowest cost/perReplicaCapacity) but overpaid when a cheaper-by-absolute-cost variant could cover the required overflow at lower total spendmaxReplicascapDetails
The N-1 terms cancel algebraically, so the check reduces to:
On equality the more efficient variant is kept (strict
<) — at equal cost it provides more capacity per dollar.Fixes #1251
Test plan
make test— all 113 specs green includingTestScaleToZero/should respect maxReplicas during scale-up)🤖 Generated with Claude Code