feat: add distributer algorithm #459

minettekaum · 2025-12-12T17:37:05Z

Description

Adding ring_attn algorithm

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

I ran the tests

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

cursor

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2025-12-12T17:46:02Z

src/pruna/algorithms/ring_attn/ring.py

+            seed_t = sync_tensor(seed_t, dim=0, group=None)
+            seed_t = seed_t.chunk(world_size, dim=0)[0]
+            seed = seed_t.item()
+            seed -= torch.iinfo(torch.int64).min


Bug: Incorrect seed calculation produces excessively large values

The seed calculation subtracts torch.iinfo(torch.int64).min (which equals -2^63) from the seed, effectively adding 2^63. Since torch.randint already produces non-negative values in [0, 2^63-1), this subtraction results in seed values in [2^63, 2^64-1), which are extremely large. This appears unintentional - the seed is already suitable for manual_seed() without this transformation. The unnecessary arithmetic could cause overflow issues or unexpected behavior with the random number generator.

cursor · 2025-12-12T17:46:02Z

src/pruna/algorithms/ring_attn/utils/ring_utils.py

+        torch.Tensor
+            The gradient of the output tensor.
+        """
+        return ring_attention._scaled_dot_product_ring_flash_attention_backward(*args, **kwargs)


Bug: Incomplete backward pass missing saved tensors for gradient computation

The LocalFunc autograd function's backward method is incomplete. The forward method doesn't call ctx.save_for_backward() to save the tensors needed for gradient computation (mesh, query, key, value, output, lse). The backward method only receives gradient outputs via *args and passes them directly to _scaled_dot_product_ring_flash_attention_backward, but this function typically requires the original inputs and outputs to compute input gradients. This would cause training (backward pass) to fail with incorrect arguments or missing data.

johannaSommer

LGTM! There is one hook missing in smash.py that checks for this ring attention algorithm and spawns the distribtued server, otherwise no notes 🌻

Ubuntu added 2 commits December 12, 2025 17:05

ring_attn added

4bd6918

fixed typo

b0047fa

cursor bot reviewed Dec 12, 2025

View reviewed changes

minettekaum requested a review from johannaSommer December 16, 2025 15:16

johannaSommer requested changes Dec 18, 2025

View reviewed changes

minettekaum changed the title ~~Feat/distributer-algorithm: adding ring_attn~~ Feat: distributer-algorithm Dec 19, 2025

minettekaum changed the title ~~Feat: distributer-algorithm~~ Feat: add distributer algorithm Dec 19, 2025

minettekaum changed the title ~~Feat: add distributer algorithm~~ feat: add distributer algorithm Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add distributer algorithm #459

feat: add distributer algorithm #459

Uh oh!

minettekaum commented Dec 12, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Dec 12, 2025

Uh oh!

cursor bot Dec 12, 2025

Uh oh!

johannaSommer left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add distributer algorithm #459

Are you sure you want to change the base?

feat: add distributer algorithm #459

Uh oh!

Conversation

minettekaum commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 12, 2025

Choose a reason for hiding this comment

Bug: Incorrect seed calculation produces excessively large values

Uh oh!

cursor bot Dec 12, 2025

Choose a reason for hiding this comment

Bug: Incomplete backward pass missing saved tensors for gradient computation

Uh oh!

johannaSommer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

minettekaum commented Dec 12, 2025 •

edited

Loading