Bumping llm-d release to latest version v0.8.1#1337
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the repository’s pinned llm-d release version defaults to v0.8.0 for more consistent local deploy and CI behavior, and updates the EPP image reference in the HPA coordinator sample values to the renamed Router EPP image.
Changes:
- Bump
LLM_D_RELEASEdefault fromv0.7.0→v0.8.0in the Makefile, CI workflows, and deployment docs. - Update coordinator sample EPP values to use
ghcr.io/llm-d/llm-d-router-endpoint-picker:v0.9.0(renamed fromllm-d-inference-scheduler).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Updates default LLM_D_RELEASE used by make-driven deploy/test targets. |
| deploy/README.md | Updates example invocation to use LLM_D_RELEASE=v0.8.0. |
| deploy/kubernetes/README.md | Clarifies the current default LLM_D_RELEASE while keeping guide selection guidance intact. |
| config/samples/hpa/co-ordinator/model-a-epp-values.yaml | Switches sample EPP image repo/name and bumps its tag. |
| config/samples/hpa/co-ordinator/model-b-epp-values.yaml | Switches sample EPP image repo/name and bumps its tag. |
| .github/workflows/ci-pr-checks.yaml | Pins CI e2e jobs to LLM_D_RELEASE=v0.8.0. |
| .github/workflows/ci-e2e-openshift.yaml | Pins OpenShift e2e to LLM_D_RELEASE=v0.8.0. |
|
/ok-to-test |
|
🚀 Kind E2E (full) triggered by |
|
/ok-to-test |
|
🚀 Kind E2E (full) triggered by |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
/ok-to-test |
|
🚀 Kind E2E (full) triggered by |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check
|
| Resource | Total | Allocated | Available |
|---|---|---|---|
| GPUs | **** |
| Cluster | Value |
|---|---|
| Nodes | ( with GPUs) |
| Total CPU | cores |
| Total Memory | Gi |
| GPUs required | (min) / (recommended) |
mamy-CS
left a comment
There was a problem hiding this comment.
The release note can be clarified better, as I understand it the chart version is controlled by LLM_D_ROUTER_VERSION, not LLM_D_RELEASE correct? and the default in the actual diff is v0.8.1, not v0.8.0 as the note says. Worth fixing before merge so users overriding these variables know which knob to turn.
| POC_WVA_NS := workload-variant-autoscaler-system | ||
| POC_MON_NS := workload-variant-autoscaler-monitoring | ||
| GAIE_VERSION ?= v1.5.0 | ||
| LLM_D_RELEASE ?= v0.8.1 |
There was a problem hiding this comment.
is this used? looks like only LLM_D_ROUTER_VERSION is used
There was a problem hiding this comment.
Yes, it's used to select the right guides to deploy from the llm-d repo (it's used in the script infra_epp.sh).
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com>
|
@mamy-CS @mamy-CS @lionelvillard, I was able to successfully run all e2e tests in my local kind cluster - see the attached logs: This PR is in good shape to be merged if there are no further comments. |
@mamy-CS, the llm-d router version does not follow the same release number as the upstream llm-d - see https://github.com/llm-d/llm-d/releases/tag/v0.8.1 - I updated the release number to |
- Delete hack/benchmark folder and hack/benchmark-jobs-template.yaml - Remove deploy_benchmark_grafana() and deploy_optional_benchmark_grafana() from infra_monitoring.sh The benchmark tooling was added in #1337 as an undocumented side effect of the llm-d release bump. It was never integrated into CI/CD workflows or actively maintained, and no team is actively using it. Removing it reduces code maintenance burden and clarifies intent. Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
This PR bumps llm-d release version
v0.8.1(latest).Proposed Changes
(oci://ghcr.io/llm-d/charts/llm-d-router-standalone), replacing the
kubernetes-sigs GAIE standalone chart. Updates deploy/lib/infra_epp.sh.
EPP image to v0.9.0 via a new LLM_D_ROUTER_VERSION variable (default v0.9.0),
separate from LLM_D_RELEASE (default v0.8.1, controls only the guide values
fetched from llm-d/llm-d). The two are released independently — the
llm-d/llm-d-router repo never published a v0.8.1 chart, so deriving the
chart version from LLM_D_RELEASE would 404. Threaded through Makefile,
CI workflows, install scripts, and poc.mk.
(was llm-d-inference-scheduler:v0.8.0) following the upstream rename
Inference Scheduler → llm-d Router. The old image repo no longer ships
new tags.
and epp-flow-control.values.yaml now use the chart's router: wrapper
(was flat inferenceExtension:/inferencePool:) and the v0.8.0
EndpointPickerConfig apiVersion (llm-d.ai/v1alpha1). Without this,
the new chart fails to render with
.Values.inferencePool.modelServers.matchLabels is required.
(Makefile, ci-e2e-openshift.yaml, ci-pr-checks.yaml, deploy/README.md,
deploy/kubernetes/README.md). install-epp.sh was already at v0.8.1; the
rest were stuck at v0.7.0, so make deploy-… and direct script invocations
picked different versions.
install-epp.sh, cleanup.sh, deploy-infrastructure.sh, and
kind-emulator/README.md. Comment/docs only.
Pre-review Checklist
Release Note
Docs