-
Notifications
You must be signed in to change notification settings - Fork 76
Print TensorView for debugging #5903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Otherwise I couldn't even locate the Expr in a printed fusion.
|
!test |
Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Error Message Improvement
|
Test failures (partial, pipeline still running)
-
(Medium, 1)
Thunder nvFuser scalar mismatch in nanogpt autograd test (test_networks)Test Name A100 Source thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32 ❌
Greptile OverviewGreptile SummaryThis PR adjusts the error messages in The functional behavior of broadcast lowering remains the same; only the diagnostics emitted on rank mismatch are changed. Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as convertSingleOpToCommunication
participant Info as getCommunicationInfo
participant Bcast as lowerToBroadcast
participant Mesh as DeviceMesh
Caller->>Info: compute CommunicationInfo(type)
Info-->>Caller: type = Broadcast
Caller->>Bcast: lowerToBroadcast(input_tv, output_tv)
Bcast->>Mesh: sender_mesh = input_tv.getDeviceMesh()
Bcast->>Mesh: receiver_mesh = output_tv.getDeviceMesh()
Bcast->>Bcast: NVF_ERROR_EQ(sender_mesh.rank(), 1, ...)
Bcast->>Bcast: NVF_ERROR_EQ(receiver_mesh.rank(), 1, ...)
Bcast->>Mesh: root = sender_mesh.at(0)
Bcast->>Bcast: team = receiver_mesh.vector()
Bcast-->>Caller: comms.push_back(Communication::Broadcast)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| NVF_ERROR_EQ(sender_mesh.rank(), 1, "sender: ", input_tv->toString()); | ||
| NVF_ERROR_EQ(receiver_mesh.rank(), 1, "receiver: ", output_tv->toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Error message regression: the previous message included the actual DeviceMesh ("Given ", sender_mesh/receiver_mesh). The new message prints TensorView::toString() but no longer prints the offending mesh, which can make debugging rank mismatches harder (you can’t see whether rank() is wrong because the mesh is unexpected vs the TV being different).
Consider keeping the original context and appending the TV string, e.g. include sender_mesh/receiver_mesh in the message as well.
Otherwise I couldn't even locate the Expr in a printed fusion.