Skip to content

Conversation

@Clorist33
Copy link
Contributor

@Clorist33 Clorist33 commented Dec 5, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily addresses a bug in the MoE MLP implementation by correctly formatting the group_index parameter for the npu_dequant_swiglu_quant operation, which appears to be a critical fix. The PR also includes several updates to test files to adapt to other changes and improve robustness. However, I've identified a critical issue in one of the test file changes, tests/ut/eplb/core/test_eplb_device_transfer_loader.py, which seems to introduce a failing test and points to an underlying bug in the expert weight loading logic that is not addressed in this PR.

loader_obj.generate_expert_d2d_transfer_task([], [], {}, 0)
assert loader_obj.comm_op_list is None
# assert loader_obj.comm_op_list is None
assert loader_obj.comm_op_list == []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This assertion change will cause the test to fail with the current implementation.

The generate_expert_d2d_transfer_task function returns early when expert_send_info and expert_recv_info are empty, without modifying loader_obj.comm_op_list. Since comm_op_list is None from initialization and is not changed by the first call in this test, it remains None. Therefore, the assertion assert loader_obj.comm_op_list == [] will fail. The original assertion assert loader_obj.comm_op_list is None was correct.

This change likely points to an underlying bug in generate_expert_d2d_transfer_task located in vllm_ascend/eplb/core/eplb_device_transfer_loader.py. The function doesn't reset self.comm_op_list when no new transfer tasks are generated. This could lead to asyn_expert_weight_transfer re-executing stale communication operations from a previous call, which is a serious bug.

A potential fix in vllm_ascend/eplb/core/eplb_device_transfer_loader.py would be to reset self.comm_op_list at the beginning of generate_expert_d2d_transfer_task. For example:

# In vllm_ascend/eplb/core/eplb_device_transfer_loader.py
def generate_expert_d2d_transfer_task(self, ...):
    if self.state != ExpertWeightUpdateState.WAITING:
        # ...
        return

    self.comm_op_list = []  # Reset here

    if not (expert_send_info or expert_recv_info):
        return
    # ...

With such a fix in the implementation, this test change would be correct. As it stands, this PR introduces a failing test.

tanqingshan (A) added 2 commits December 5, 2025 15:39
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
@Clorist33 Clorist33 force-pushed the Bugfix_moe_mlp_4490 branch from ef6c5c3 to 1021a57 Compare December 5, 2025 07:40
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
@Clorist33 Clorist33 force-pushed the Bugfix_moe_mlp_4490 branch from e26453e to 945aab4 Compare December 5, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant