Skip to content

Conversation

@themavik
Copy link

Summary

GPTRewardModel.forward() crashes with a ZeroDivisionError when the input batch has 0 or 1 samples:

bs = input_ids.shape[0] // 2  # 0 when shape[0] < 2
...
loss = loss / bs  # ZeroDivisionError

Additionally, torch.stack(chosen_end_scores) fails with RuntimeError on the empty list when bs == 0.

Changes

Add an early return guard after the batch size calculation. When bs == 0, the function returns:

  • loss: zero tensor on the input device
  • chosen_end_scores: empty tensor on the input device
  • rejected_end_scores: empty tensor on the input device

Test Plan

  • Batch with 2+ samples (normal paired input): behavior unchanged
  • Batch with 1 sample: returns zero loss and empty scores instead of crashing
  • Batch with 0 samples: returns zero loss and empty scores instead of crashing

Fixes #609

When the input batch has 0 or 1 samples, bs = input_ids.shape[0] // 2
evaluates to 0, causing ZeroDivisionError at loss = loss / bs and
RuntimeError at torch.stack on empty lists.

Add an early return when bs == 0, returning zero loss and empty score
tensors on the correct device.

Fixes CarperAI#609

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Division by Zero in GPTRewardModel with Empty Batches

1 participant