Skip to content

Conversation

@kiscad
Copy link
Contributor

@kiscad kiscad commented Dec 5, 2025

What this PR does / why we need it?

This PR is to fix a smoking test failure. Adjust mtp_proposer and model_runner_v1 to route MTP decoding through the non‑fused MoE implementation while keeping the overall inference flow unchanged.

Does this PR introduce any user-facing change?

How was this patch tested?

This PR will be tested in smoking tests.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix to disable the fused-moe kernel during the dummy_run of the MTP (Multi-path Transformer) proposer. This is accomplished by checking if the selected MoE communication method is FUSED_ALLTOALL and reverting to the standard ALLTOALL method if it is. This change is localized and specifically targets the dummy_run, which is crucial for graph capturing. The modification correctly addresses a likely bug with the fused kernel in this context, and the implementation is sound. No issues were found in the proposed changes.

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@kiscad kiscad force-pushed the bugfix-fusedmoe branch 2 times, most recently from 2f86c9f to 77e74e4 Compare December 6, 2025 12:08
@kiscad kiscad changed the title [Bugfix] disable fused-moe kernel in MTP module [Bugfix] Disable fused MoE kernel in MTP decoding path Dec 6, 2025
@kiscad kiscad changed the title [Bugfix] Disable fused MoE kernel in MTP decoding path [Bugfix] Disable the dispatch_ffn_combine kernel in MTP path Dec 6, 2025
@zzzzwwjj
Copy link
Collaborator

zzzzwwjj commented Dec 8, 2025

  1. moe_comm_type judgement conditions need to consider whether is quant case;
  2. dispatch_ffn_combine op need support EP>16 case;
    This two problems need to finish afterwards.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 8, 2025
@kiscad kiscad force-pushed the bugfix-fusedmoe branch 3 times, most recently from 4273f3c to 37ec423 Compare December 9, 2025 02:52
Signed-off-by: mojave2 <chenchen145@huawei.com>
@MengqingCao MengqingCao merged commit 848419d into vllm-project:main Dec 9, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants