Skip to content

[AMD] Register 2 recently-added CPU/ROCm-safe tests for AMD 1-GPU PR CI#29680

Open
michaelzhang-ai wants to merge 1 commit into
sgl-project:mainfrom
michaelzhang-ai:amd/add-1gpu-small-recent-nv
Open

[AMD] Register 2 recently-added CPU/ROCm-safe tests for AMD 1-GPU PR CI#29680
michaelzhang-ai wants to merge 1 commit into
sgl-project:mainfrom
michaelzhang-ai:amd/add-1gpu-small-recent-nv

Conversation

@michaelzhang-ai

@michaelzhang-ai michaelzhang-ai commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Motivation

These two tests were recently added to NVIDIA per-commit CI and are AMD-safe with no code changes — they only need a register_amd_ci(...) line next to the existing register_cuda_ci(...). This closes part of the AMD-vs-NVIDIA per-commit coverage gap surfaced by the ROCm upstream-ci dashboard.

Registered files

File AMD suite est_time Why it's safe on ROCm
test/registered/models/test_vit_pos_embed_interpolate.py stage-a-test-1-gpu-small-amd 20 Pure torch embedding lookups + arithmetic, asserted bit-exact (rtol=0/atol=0). Already runs on CPU and CUDA; no fp8 / FlashInfer / custom-kernel paths. Heavy model deps are guarded with try/except ... skipTest.
test/registered/unit/mem_cache/test_minimax_sparse_pool_host_unit.py stage-b-test-1-gpu-small-amd 9 The integration class is pure-CPU (device="cpu"). The device↔host transfer class is already ROCm-aware — setUp enables it for is_cuda() or is_hip() and skips NPU/XPU.

Both files keep their existing register_cuda_ci(...) and register_cpu_ci(...) calls unchanged; only register_amd_ci(...) (legacy suite= shape) is added so the effective suite resolves to the canonical AMD per-commit suite name (AMD suites were not renamed in the upstream stage-*base-* rename).

Local verification (AST collector)

Using the vendored python/sglang/test/ci/ci_register.py parser:

stage-a-test-1-gpu-small-amd: 5 -> 6 AMD tests (+ test_vit_pos_embed_interpolate.py, est 20)
stage-b-test-1-gpu-small-amd: 112 -> 113 AMD tests (+ test_minimax_sparse_pool_host_unit.py, est 9)

Test plan

  • AMD stage-a-test-1-gpu-small-amd passes on test_vit_pos_embed_interpolate.py
  • AMD stage-b-test-1-gpu-small-amd passes on test_minimax_sparse_pool_host_unit.py
  • rocm720 lane green on both files
  • NVIDIA CI still green (registration-only change)

CI States

Latest PR Test (Base): ⏳ Run #28405097298
Latest PR Test (Extra): ❌ Run #28405113548

These tests landed recently on NVIDIA per-commit CI and are AMD-safe with
no code changes:

- models/test_vit_pos_embed_interpolate.py: pure torch embedding math,
  asserted bit-exact, already runs on CPU and CUDA.
- unit/mem_cache/test_minimax_sparse_pool_host_unit.py: CPU integration
  class + a device/host transfer class that already enables is_hip().

Adding register_amd_ci(...) next to register_cuda_ci(...) is sufficient.

AST collector: stage-a-test-1-gpu-small-amd 5 -> 6,
stage-b-test-1-gpu-small-amd 112 -> 113.
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant