Skip to content

Conversation

@kilinchange
Copy link
Collaborator

@kilinchange kilinchange commented Dec 19, 2025

主要改动:

  1. 重构 ProcessGroup 的通信算子,拆分 TP/PP 通信的计算/通信流,并通过 async_op 参数统一控制跨流同步逻辑;
  2. 抽象 ProcessGroup 虚基类,并提供基于 NCCL 的具体实现(ProcessGroupNCCL),以便后续支持多 device/多后端的扩展。

…nd communication streams for tp/pp, with stream synchnonization controlled via 'async_op'
@kilinchange kilinchange force-pushed the feature/tp-pp-split-stream branch 2 times, most recently from 8a2cb76 to 9d5d774 Compare December 22, 2025 02:31
@kilinchange kilinchange force-pushed the feature/tp-pp-split-stream branch 2 times, most recently from 4735367 to d2eb083 Compare December 22, 2025 02:45
@kilinchange kilinchange changed the title Feature: ecoupling compute and communication streams for tp/pp Feature: decoupling compute and communication streams for tp/pp Dec 22, 2025
@JYMiracle305
Copy link
Contributor

image 之前标题都是 feat: ...,这儿也统一一下吧

@kilinchange kilinchange changed the title Feature: decoupling compute and communication streams for tp/pp feat: decoupling compute and communication streams for tp/pp Dec 22, 2025
@kilinchange kilinchange merged commit 5dffc46 into master Dec 23, 2025
2 checks passed
@kilinchange kilinchange deleted the feature/tp-pp-split-stream branch December 23, 2025 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants