Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Dec 5, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the project to use PyTorch nightly builds by modifying Dockerfiles to point to nightly package indices and removing version pins for torch and related libraries in the requirements files. The changes are generally in the right direction for supporting nightly builds. However, I've identified two issues: one critical issue in the main Dockerfile that will cause the build to fail, and one high-severity issue in the CMake configuration that will produce incorrect warnings. Addressing these will ensure the transition to nightly builds is smooth and maintainable.


# PyTorch provides its own indexes for standard and nightly builds
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl/nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change to use the nightly PyTorch index will cause the Docker build to fail. The CUDA_VERSION is set to 12.9.1 at the top of the file, which leads to the construction of an invalid URL .../nightly/cu129 for PyTorch wheels on line 143. The PyTorch nightly builds are not available for cu129 (the latest is cu128).

To fix this, you should either:

  1. Change CUDA_VERSION to a version that has a corresponding PyTorch nightly build (e.g., 12.8.x). This might require changing the base image.
  2. Hardcode a valid CUDA version for the PyTorch index URL in the uv pip install command. Given that CUDA_VERSION is used for the base image, this option is likely safer.

For example, you could change line 143 to:

-    --extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.')
+    --extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu128

Comment on lines +59 to +60
set(TORCH_SUPPORTED_VERSION_CUDA "2.10.0")
set(TORCH_SUPPORTED_VERSION_ROCM "2.10.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

With the move to nightly builds, the version check for CUDA at line 125 will consistently fail and produce warnings. Nightly versions (e.g., 2.10.0.dev...) will not match 2.10.0 with VERSION_EQUAL.

This will create unnecessary noise in build logs, potentially hiding real issues. Please consider updating the check to be more flexible for nightly versions, for example by using VERSION_GREATER_EQUAL or VERSION_MATCHES, similar to how the ROCm version is checked.

@mergify
Copy link

mergify bot commented Dec 5, 2025

Hi @atalman, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 5 to 9
setuptools>=77.0.3,<81.0.0
setuptools-scm>=8
torch==2.9.0
torch
wheel
jinja2>=3.1.6

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Unpinned build torch resolves to unsupported release

The build requirements now allow any torch version (requirements/build.txt line 7) while CMake declares 2.10 as the supported CUDA/ROCm version (CMakeLists.txt lines 59-60). With the default PyPI index this resolves to the current stable CPU wheel (e.g., 2.x) rather than the expected 2.10 GPU nightly, so a developer running uv pip install -r requirements/build.txt will compile the extensions against an unsupported CPU build that lacks the CUDA ABI and typically fails at build or import time. Please bound the requirement to the supported 2.10 nightly channel instead of leaving it unconstrained.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia rocm Related to AMD ROCm

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant