[CPU][Perf] Accelerate unquantized MoE for AArch64 by fadara01 · Pull Request #46353 · vllm-project/vllm

fadara01 · 2026-06-22T08:36:06Z

Purpose

Accelerate unquantized MoE for AArch64

Enable FusedMoE kernel for AArch64
Implement AdvSIMD BFMMLA interface to accelerate w13 and w2 GEMMs
Extend generic micro kernel interface and MoE kernel to support packing input matrix
Abstract sleef.h includes and tanh symbol for x86 under the AVX vectorizer class

Performance

1.96x higher throughput for gpt-oss and 2.18x higher throughput for gemma4 with benchmark below and 64 Neoverse-V2 cores

MODEL=unsloth/gpt-oss-20b-BF16
#MODEL=google/gemma-4-26B-A4B-it
# gemma4 needs this as attention currently hangs without it.
#export VLLM_CPU_ATTN_SPLIT_KV=0 
vllm bench throughput \
  --num-prompts 128 \
  --seed 0 \
  --dataset-name sonnet \
  --dataset-path /home/fadara01/vllm-moe/vllm/benchmarks/sonnet.txt \
  --input-len 256 \
  --output-len 256 \
  --max-model-len 4096 \
  --max-num-batched-tokens 4096 \
  --model $MODEL \
  --tensor-parallel-size 1 \
  --no-enable-prefix-caching \
  --num-warmups 5

Test Plan

CI

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

fadara01 · 2026-06-22T09:05:23Z

Hi @mgoin @bigPYJ1151 :)

Could you please have a look at this?

mergify · 2026-06-22T13:12:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fadara01.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

fadara01 requested review from AndreasKaratzas, Harry-Chen, WoosukKwon, bigPYJ1151, khluu, mgoin, pavanimajety, tlrmchlsmth, yewentao256 and zyongye as code owners June 22, 2026 08:36

mergify Bot added ci/build performance Performance-related issues gpt-oss Related to GPT-OSS models labels Jun 22, 2026

mergify Bot assigned fadara01 Jun 22, 2026

github-project-automation Bot added this to gpt-oss Issues & Enhancements Jun 22, 2026

mergify Bot added the cpu Related to CPU backends label Jun 22, 2026

github-project-automation Bot moved this to To Triage in gpt-oss Issues & Enhancements Jun 22, 2026

fadara01 mentioned this pull request Jun 22, 2026

[CPU] Support Gemma Diffusion #45690

Merged

4 tasks

fadara01 marked this pull request as draft June 22, 2026 10:25

fadara01 marked this pull request as ready for review June 22, 2026 11:42

mergify Bot added the needs-rebase label Jun 22, 2026

bigPYJ1151 reviewed Jun 23, 2026

View reviewed changes

Comment thread csrc/cpu/cpu_fused_moe.cpp Outdated

fadara01 added 2 commits June 23, 2026 16:05

[CPU][Perf] Accelerate unquantized MoE for AArch64

d075a52

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

fix cache aware w2 tile sizing to not count non-packed weight

97cd6ef

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

fadara01 force-pushed the fused_moe_arm branch from 26f52a9 to 97cd6ef Compare June 23, 2026 16:24

fadara01 requested a review from bigPYJ1151 June 23, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU][Perf] Accelerate unquantized MoE for AArch64#46353

[CPU][Perf] Accelerate unquantized MoE for AArch64#46353
fadara01 wants to merge 2 commits into
vllm-project:mainfrom
fadara01:fused_moe_arm

fadara01 commented Jun 22, 2026 •

edited

Loading

Uh oh!

fadara01 commented Jun 22, 2026

Uh oh!

mergify Bot commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fadara01 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Performance

Test Plan

Test Result

Uh oh!

fadara01 commented Jun 22, 2026

Uh oh!

mergify Bot commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fadara01 commented Jun 22, 2026 •

edited

Loading