fix: fp16 config and grad scaling #1495

yxyOo · 2026-01-26T12:42:04Z

Summary

Ensure FP16 training is consistent and safe by setting required optimizer params, aligning fp16/bf16 flags, and avoiding duplicate grad scaling.

Changes

Set required FP16 optimizer parameters in code to avoid missing user‑configurable CLI options.
Set store_param_remainders=false to reduce memory usage, since this option isn’t exposed by Megatron’s public config.
Skip duplicate grad scaling checks during FP16 training.
Refs: [megatron] feat: fp16 training (dense and MoE supported) verl-project/verl#4086

Results

fp16 shows faster convergence speed and lower mismatch between training and rollout.

zhuzilin · 2026-01-30T03:07:53Z

slime/backends/megatron_utils/model.py

+        kwargs["min_loss_scale"] = 1
+        kwargs["use_precision_aware_optimizer"] = True
+        kwargs["store_param_remainders"] = False
+        logger.info(f"FP16 mode enabled. Optimizer config: {kwargs}")


I think that these params can be set with the

for f in dataclasses.fields(OptimizerConfig): if hasattr(args, f.name): kwargs[f.name] = getattr(args, f.name)

For example, you can set the initial_loss_scale with --initial-loss-scale 32768.

Yes, configurable, but these params must be enabled with FP16—otherwise performance degrades and users may miss them.

And store_param_remainders cannot be set this way (not exposed in Megatron’s public config), while other params are permissible.

yxyOo added 2 commits January 26, 2026 20:00

fix: fp16 config and grad scaling

2169ec6

fix

5b402e3

zhuzilin reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fp16 config and grad scaling #1495

fix: fp16 config and grad scaling #1495

yxyOo commented Jan 26, 2026

Uh oh!

zhuzilin Jan 30, 2026

Uh oh!

yxyOo Jan 30, 2026

Uh oh!

yxyOo Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: fp16 config and grad scaling #1495

Are you sure you want to change the base?

fix: fp16 config and grad scaling #1495

Conversation

yxyOo commented Jan 26, 2026

Summary

Changes

Results

Uh oh!

zhuzilin Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

yxyOo Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

yxyOo Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants