Skip to content

Bumping llm-d release to latest version v0.8.1#1337

Merged
mamy-CS merged 1 commit into
llm-d:mainfrom
dumb0002:epp-fixes
Jun 30, 2026
Merged

Bumping llm-d release to latest version v0.8.1#1337
mamy-CS merged 1 commit into
llm-d:mainfrom
dumb0002:epp-fixes

Conversation

@dumb0002

@dumb0002 dumb0002 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

This PR bumps llm-d release version v0.8.1 (latest).

Proposed Changes

  • 🎁 Migrate EPP install to the upstream llm-d-router-standalone chart
    (oci://ghcr.io/llm-d/charts/llm-d-router-standalone), replacing the
    kubernetes-sigs GAIE standalone chart. Updates deploy/lib/infra_epp.sh.
  • 🧹 Pin the new llm-d-router-standalone chart and llm-d-router-endpoint-picker
    EPP image to v0.9.0 via a new LLM_D_ROUTER_VERSION variable (default v0.9.0),
    separate from LLM_D_RELEASE (default v0.8.1, controls only the guide values
    fetched from llm-d/llm-d). The two are released independently — the
    llm-d/llm-d-router repo never published a v0.8.1 chart, so deriving the
    chart version from LLM_D_RELEASE would 404. Threaded through Makefile,
    CI workflows, install scripts, and poc.mk.
  • 🧹 Bump EPP image to ghcr.io/llm-d/llm-d-router-endpoint-picker:v0.9.0
    (was llm-d-inference-scheduler:v0.8.0) following the upstream rename
    Inference Scheduler → llm-d Router. The old image repo no longer ships
    new tags.
  • 🐛 Reshape values overlays for the new chart: model-{a,b}-epp-values.yaml
    and epp-flow-control.values.yaml now use the chart's router: wrapper
    (was flat inferenceExtension:/inferencePool:) and the v0.8.0
    EndpointPickerConfig apiVersion (llm-d.ai/v1alpha1). Without this,
    the new chart fails to render with
    .Values.inferencePool.modelServers.matchLabels is required.
  • 🧹 Default LLM_D_RELEASE to v0.8.1 everywhere
    (Makefile, ci-e2e-openshift.yaml, ci-pr-checks.yaml, deploy/README.md,
    deploy/kubernetes/README.md). install-epp.sh was already at v0.8.1; the
    rest were stuck at v0.7.0, so make deploy-… and direct script invocations
    picked different versions.
  • 🧹 Update remaining references to the old chart name in poc.mk,
    install-epp.sh, cleanup.sh, deploy-infrastructure.sh, and
    kind-emulator/README.md. Comment/docs only.

Pre-review Checklist

  • E2E tests for any new behavior
  • Docs PR for any user-facing impact
  • Proposal PR for any new enhancement or change to existing behavior

Release Note

  The bundled EPP image is now `ghcr.io/llm-d/llm-d-router-endpoint-picker:v0.9.0` (was `ghcr.io/llm-d/llm-d-inference-scheduler:v0.8.0`); the upstream image was
  renamed when the project rebranded to llm-d Router. The EPP Helm install also moved from the kubernetes-sigs GAIE standalone chart to the upstream
  `llm-d-router-standalone` chart at `oci://ghcr.io/llm-d/charts/llm-d-router-standalone`, with the chart version now pinned by `LLM_D_RELEASE` (default `v0.8.1`).
  action required: users overriding `LLM_D_RELEASE` to `v0.7.x` are no longer supported by `deploy/install-epp.sh`; users with custom EPP values overrides must rewrap
   them under the chart's new `router:` key (e.g. `inferenceExtension.image` → `router.epp.image`, `inferenceExtension.sidecar` → `router.proxy`). `GAIE_VERSION`
  continues to pin the GAIE CRDs independently.

Docs

Copilot AI review requested due to automatic review settings June 25, 2026 16:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the repository’s pinned llm-d release version defaults to v0.8.0 for more consistent local deploy and CI behavior, and updates the EPP image reference in the HPA coordinator sample values to the renamed Router EPP image.

Changes:

  • Bump LLM_D_RELEASE default from v0.7.0v0.8.0 in the Makefile, CI workflows, and deployment docs.
  • Update coordinator sample EPP values to use ghcr.io/llm-d/llm-d-router-endpoint-picker:v0.9.0 (renamed from llm-d-inference-scheduler).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Makefile Updates default LLM_D_RELEASE used by make-driven deploy/test targets.
deploy/README.md Updates example invocation to use LLM_D_RELEASE=v0.8.0.
deploy/kubernetes/README.md Clarifies the current default LLM_D_RELEASE while keeping guide selection guidance intact.
config/samples/hpa/co-ordinator/model-a-epp-values.yaml Switches sample EPP image repo/name and bumps its tag.
config/samples/hpa/co-ordinator/model-b-epp-values.yaml Switches sample EPP image repo/name and bumps its tag.
.github/workflows/ci-pr-checks.yaml Pins CI e2e jobs to LLM_D_RELEASE=v0.8.0.
.github/workflows/ci-e2e-openshift.yaml Pins OpenShift e2e to LLM_D_RELEASE=v0.8.0.

@dumb0002

Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@dumb0002 dumb0002 requested review from asm582 and lionelvillard June 25, 2026 17:38
@dumb0002 dumb0002 added ready-for-review Signal that changes are ready for review area/installation labels Jun 25, 2026
lionelvillard
lionelvillard previously approved these changes Jun 25, 2026
@dumb0002 dumb0002 enabled auto-merge (squash) June 25, 2026 17:45
@github-actions

Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@dumb0002

Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions

Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions

Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 26 24
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@mamy-CS

mamy-CS commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions

Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions

Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ⚠️

Low GPU headroom — tests may fail during scale-up phases.

Resource Total Allocated Available
GPUs ****
Cluster Value
Nodes ( with GPUs)
Total CPU cores
Total Memory Gi
GPUs required (min) / (recommended)

@mamy-CS mamy-CS left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release note can be clarified better, as I understand it the chart version is controlled by LLM_D_ROUTER_VERSION, not LLM_D_RELEASE correct? and the default in the actual diff is v0.8.1, not v0.8.0 as the note says. Worth fixing before merge so users overriding these variables know which knob to turn.

POC_WVA_NS := workload-variant-autoscaler-system
POC_MON_NS := workload-variant-autoscaler-monitoring
GAIE_VERSION ?= v1.5.0
LLM_D_RELEASE ?= v0.8.1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used? looks like only LLM_D_ROUTER_VERSION is used

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's used to select the right guides to deploy from the llm-d repo (it's used in the script infra_epp.sh).

Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com>
@dumb0002 dumb0002 changed the title Bumping llm-d release to latest version v0.8.0 Bumping llm-d release to latest version v0.8.1 Jun 29, 2026
@dumb0002

dumb0002 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

@mamy-CS @mamy-CS @lionelvillard, I was able to successfully run all e2e tests in my local kind cluster - see the attached logs:
WVA_Full_E2E_Tests.txt

This PR is in good shape to be merged if there are no further comments.

@dumb0002

dumb0002 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

The release note can be clarified better, as I understand it the chart version is controlled by LLM_D_ROUTER_VERSION, not LLM_D_RELEASE correct? and the default in the actual diff is v0.8.1, not v0.8.0 as the note says. Worth fixing before merge so users overriding these variables know which knob to turn.

@mamy-CS, the llm-d router version does not follow the same release number as the upstream llm-d - see https://github.com/llm-d/llm-d/releases/tag/v0.8.1 - I updated the release number to v0.8.1 as suggested.

@mamy-CS mamy-CS disabled auto-merge June 30, 2026 15:23
@mamy-CS mamy-CS merged commit 46ae7b6 into llm-d:main Jun 30, 2026
21 checks passed
asm582 added a commit that referenced this pull request Jun 30, 2026
- Delete hack/benchmark folder and hack/benchmark-jobs-template.yaml
- Remove deploy_benchmark_grafana() and deploy_optional_benchmark_grafana() from infra_monitoring.sh

The benchmark tooling was added in #1337 as an undocumented side effect
of the llm-d release bump. It was never integrated into CI/CD workflows
or actively maintained, and no team is actively using it. Removing it
reduces code maintenance burden and clarifies intent.

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/installation ready-for-review Signal that changes are ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants