-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[WIP] Nightly signal on torch #30146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the project to use PyTorch nightly builds by modifying Dockerfiles to point to nightly package indices and removing version pins for torch and related libraries in the requirements files. The changes are generally in the right direction for supporting nightly builds. However, I've identified two issues: one critical issue in the main Dockerfile that will cause the build to fail, and one high-severity issue in the CMake configuration that will produce incorrect warnings. Addressing these will ensure the transition to nightly builds is smooth and maintainable.
|
|
||
| # PyTorch provides its own indexes for standard and nightly builds | ||
| ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl | ||
| ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl/nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to use the nightly PyTorch index will cause the Docker build to fail. The CUDA_VERSION is set to 12.9.1 at the top of the file, which leads to the construction of an invalid URL .../nightly/cu129 for PyTorch wheels on line 143. The PyTorch nightly builds are not available for cu129 (the latest is cu128).
To fix this, you should either:
- Change
CUDA_VERSIONto a version that has a corresponding PyTorch nightly build (e.g.,12.8.x). This might require changing the base image. - Hardcode a valid CUDA version for the PyTorch index URL in the
uv pip installcommand. Given thatCUDA_VERSIONis used for the base image, this option is likely safer.
For example, you could change line 143 to:
- --extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.')
+ --extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu128| set(TORCH_SUPPORTED_VERSION_CUDA "2.10.0") | ||
| set(TORCH_SUPPORTED_VERSION_ROCM "2.10.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the move to nightly builds, the version check for CUDA at line 125 will consistently fail and produce warnings. Nightly versions (e.g., 2.10.0.dev...) will not match 2.10.0 with VERSION_EQUAL.
This will create unnecessary noise in build logs, potentially hiding real issues. Please consider updating the check to be more flexible for nightly versions, for example by using VERSION_GREATER_EQUAL or VERSION_MATCHES, similar to how the ROCm version is checked.
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| setuptools>=77.0.3,<81.0.0 | ||
| setuptools-scm>=8 | ||
| torch==2.9.0 | ||
| torch | ||
| wheel | ||
| jinja2>=3.1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unpinned build torch resolves to unsupported release
The build requirements now allow any torch version (requirements/build.txt line 7) while CMake declares 2.10 as the supported CUDA/ROCm version (CMakeLists.txt lines 59-60). With the default PyPI index this resolves to the current stable CPU wheel (e.g., 2.x) rather than the expected 2.10 GPU nightly, so a developer running uv pip install -r requirements/build.txt will compile the extensions against an unsupported CPU build that lacks the CUDA ABI and typically fails at build or import time. Please bound the requirement to the supported 2.10 nightly channel instead of leaving it unconstrained.
Useful? React with 👍 / 👎.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.