-
Notifications
You must be signed in to change notification settings - Fork 46
add max 2d subarray sum #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new medium-difficulty coding challenge called "Max 2D Subarray Sum" that requires computing the maximum sum of any contiguous 2D subarray of a fixed window size. The challenge includes starter templates for multiple GPU programming frameworks and comprehensive test cases.
- Implements reference solution using 2D prefix sum approach with PyTorch
- Provides starter templates for Triton, PyTorch, Mojo, and CUDA implementations
- Includes example, functional, and performance test cases with various edge cases
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
challenges/medium/55_max_2d_subarray_sum/challenge.py |
Challenge implementation with reference solution, test generation, and function signatures |
challenges/medium/55_max_2d_subarray_sum/challenge.html |
HTML documentation describing the problem, examples, and constraints |
challenges/medium/55_max_2d_subarray_sum/starter/starter.triton.py |
Triton starter template with function signature |
challenges/medium/55_max_2d_subarray_sum/starter/starter.pytorch.py |
PyTorch starter template with function signature |
challenges/medium/55_max_2d_subarray_sum/starter/starter.mojo |
Mojo starter template with function signature |
challenges/medium/55_max_2d_subarray_sum/starter/starter.cu |
CUDA starter template with function signature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert output.dtype == torch.int32 | ||
|
|
||
| psum = input.cumsum(dim=0).cumsum(dim=1) | ||
| padded = torch.zeros((N+1, N+1), dtype=torch.int32) |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The padded tensor is created on CPU by default with torch.zeros, but the input tensor is on GPU (cuda device). This will cause a runtime error when trying to assign psum to padded[1:, 1:] since tensors need to be on the same device. The padded tensor should be created on the same device as the input tensor.
| padded = torch.zeros((N+1, N+1), dtype=torch.int32) | |
| padded = torch.zeros((N+1, N+1), dtype=torch.int32, device=input.device) |
| "window_size": 7 | ||
| }) | ||
|
|
||
| # increasing_sequence |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "increasing_sequence" but the test generates random integers using torch.randint(-10, 11, ...). This comment is misleading and should be updated to reflect what the test actually does, such as "mixed_positive_negative" or "random_values".
| # increasing_sequence | |
| # mixed_positive_negative |
| from math import ceildiv | ||
|
|
||
| # input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| @export |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is trailing whitespace after the @export decorator. This should be removed for code cleanliness.
| @export | |
| @export |
|
|
||
| // input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| extern "C" void solve(const int* input, int* output, int N, int window_size) { | ||
|
|
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is trailing whitespace on this line. This should be removed for code cleanliness.
| @@ -0,0 +1,7 @@ | |||
| import torch | |||
| import triton | |||
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'triton' is not used.
| import triton |
| @@ -0,0 +1,7 @@ | |||
| import torch | |||
| import triton | |||
| import triton.language as tl | |||
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'tl' is not used.
| import triton.language as tl |
add max 2d subarray sum