[WIP] Nightly signal on torch #30146

atalman · 2025-12-05T16:52:09Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request updates the project to use PyTorch nightly builds by modifying Dockerfiles to point to nightly package indices and removing version pins for torch and related libraries in the requirements files. The changes are generally in the right direction for supporting nightly builds. However, I've identified two issues: one critical issue in the main Dockerfile that will cause the build to fail, and one high-severity issue in the CMake configuration that will produce incorrect warnings. Addressing these will ensure the transition to nightly builds is smooth and maintainable.

gemini-code-assist · 2025-12-05T16:54:02Z

docker/Dockerfile


 # PyTorch provides its own indexes for standard and nightly builds
-ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl
+ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl/nightly


This change to use the nightly PyTorch index will cause the Docker build to fail. The CUDA_VERSION is set to 12.9.1 at the top of the file, which leads to the construction of an invalid URL .../nightly/cu129 for PyTorch wheels on line 143. The PyTorch nightly builds are not available for cu129 (the latest is cu128).

To fix this, you should either:

Change CUDA_VERSION to a version that has a corresponding PyTorch nightly build (e.g., 12.8.x). This might require changing the base image.

Hardcode a valid CUDA version for the PyTorch index URL in the uv pip install command. Given that CUDA_VERSION is used for the base image, this option is likely safer.

For example, you could change line 143 to:

- --extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') + --extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu128

gemini-code-assist · 2025-12-05T16:54:02Z

CMakeLists.txt

+set(TORCH_SUPPORTED_VERSION_CUDA "2.10.0")
+set(TORCH_SUPPORTED_VERSION_ROCM "2.10.0")


With the move to nightly builds, the version check for CUDA at line 125 will consistently fail and produce warnings. Nightly versions (e.g., 2.10.0.dev...) will not match 2.10.0 with VERSION_EQUAL.

This will create unnecessary noise in build logs, potentially hiding real issues. Please consider updating the check to be more flexible for nightly versions, for example by using VERSION_GREATER_EQUAL or VERSION_MATCHES, similar to how the ROCm version is checked.

mergify · 2025-12-05T16:56:34Z

Hi @atalman, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-05T16:57:45Z

requirements/build.txt

 setuptools>=77.0.3,<81.0.0
 setuptools-scm>=8
-torch==2.9.0
+torch
 wheel
 jinja2>=3.1.6


Unpinned build torch resolves to unsupported release

The build requirements now allow any torch version (requirements/build.txt line 7) while CMake declares 2.10 as the supported CUDA/ROCm version (CMakeLists.txt lines 59-60). With the default PyPI index this resolves to the current stable CPU wheel (e.g., 2.x) rather than the expected 2.10 GPU nightly, so a developer running uv pip install -r requirements/build.txt will compile the extensions against an unsupported CPU build that lacks the CUDA ABI and typically fails at build or import time. Please bound the requirement to the supported 2.10 nightly channel instead of leaving it unconstrained.

Useful? React with 👍 / 👎.

[WIP] Nightly signal on torch

325354b

atalman requested review from LucasWilkinson, bigPYJ1151, tjtanaa and tlrmchlsmth as code owners December 5, 2025 16:52

mergify bot added ci/build nvidia rocm Related to AMD ROCm labels Dec 5, 2025

github-project-automation bot added this to NVIDIA Dec 5, 2025

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Nightly signal on torch #30146

[WIP] Nightly signal on torch #30146

atalman commented Dec 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

mergify bot commented Dec 5, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		set(TORCH_SUPPORTED_VERSION_CUDA "2.10.0")
		set(TORCH_SUPPORTED_VERSION_ROCM "2.10.0")

Uh oh!

[WIP] Nightly signal on torch #30146

Are you sure you want to change the base?

[WIP] Nightly signal on torch #30146

Conversation

atalman commented Dec 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 5, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

atalman commented Dec 5, 2025 •

edited by github-actions bot

Loading