test: cover LWS multi-vendor GPU resources#1350
Conversation
Signed-off-by: Xuhui Lu <swimming.fish06@gmail.com>
d7a5a23 to
fddeeb9
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends unit-test coverage for LWSAccessor.GetTotalGPUsPerReplica() to ensure GPU counting works across the full set of supported vendor resource names (NVIDIA, AMD, Intel Gaudi, Intel i915, Intel Xe), including a mixed-vendor leader/worker scenario. This supports the broader multi-vendor GPU support work tracked in #1106 by guarding against regressions in GPU resource parsing for LeaderWorkerSet scale targets.
Changes:
- Added a table-driven unit test covering all supported GPU resource names for LWS GPU counting.
- Added a mixed-vendor LWS test case that combines different vendor resources across leader and worker containers.
- Introduced small test helpers to reduce duplication when building LWS/container GPU requests.
ev-shindin
left a comment
There was a problem hiding this comment.
LGTM. Look to some minor (non-blocking) comments
Signed-off-by: Xuhui Lu <swimming.fish06@gmail.com>
|
/ok-to-test |
|
🚀 Kind E2E (full) triggered by |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
ev-shindin
left a comment
There was a problem hiding this comment.
Thanks @xuhui-lu ! To follow-up: every existing and new test sets a non-zero GPU request, so the if total == 0 { return 1 } default is never exercised — change it to return 0 and the whole suite still passes. Add a CPU-only (no GPU requests) LWS case asserting 1
Summary
Part of #1106.
Testing
go test ./internal/utils/scaletargetgo test ./internal/utils/...Also tried
go test ./...; it is not clean locally for unrelated reasons:internal/engines/analyzers/throughput:FitITLModelflat-line test currently failstest/chart: localhelmbinary is not installedtest/e2e: entered cluster preflight / auth flow and was interrupted