Skip to content

Conversation

@Flakebi
Copy link
Contributor

@Flakebi Flakebi commented Dec 14, 2025

The gpu-kernel calling convention has several restrictions that were not enforced by the compiler until now.
Add the following restrictions:

  1. Cannot be async
  2. Cannot be called
  3. Cannot return values, return type must be () or !
  4. Arguments should be simple, i.e. passed by value. More complicated types can work when you know what you are doing, but it is rather unintuitive, one needs to know ABI/compiler internals.
  5. Export name should be unmangled, either through no_mangle or export_name. Kernels are searched by name on the CPU side, having a mangled name makes it hard to find and probably almost always unintentional.

Tracking issue: #135467
amdgpu target tracking issue: #135024

@workingjubilee, these should be all the restrictions we talked about a year ago.

cc @RDambrosio016 @kjetilkjeka for nvptx

@rustbot
Copy link
Collaborator

rustbot commented Dec 14, 2025

r? @WaffleLapkin

rustbot has assigned @WaffleLapkin.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 14, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 14, 2025

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@WaffleLapkin
Copy link
Member

r? workingjubilee

As I'm completely missing context

@rustbot
Copy link
Collaborator

rustbot commented Dec 15, 2025

workingjubilee is currently at their maximum review capacity.
They may take a while to respond.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AST-level code looks good.

Some details on messaging here. I'm not committed to a precise message on these, which is why it's a bit "multiple choice" here, just wondering if these could be improved. In one or two cases it is a must-change.

Should we be enforcing a maximum number of arguments, also? Probably not if there's no cross-driver consensus on that, but maybe?

View changes since this review

functional record update syntax requires a struct
hir_typeck_gpu_kernel_abi_cannot_be_called =
functions with the "gpu-kernel" ABI cannot be called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be more specific?

Suggested change
functions with the "gpu-kernel" ABI cannot be called
functions with the "gpu-kernel" ABI cannot be called from Rust
Suggested change
functions with the "gpu-kernel" ABI cannot be called
functions with the "gpu-kernel" ABI cannot be called, even in device code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn’t be called from C or the CPU either ;)
See also my other comment, I think nobody can call a gpu-kernel, it can only be “launched” (which is a different thing!).

hir_typeck_gpu_kernel_abi_cannot_be_called =
functions with the "gpu-kernel" ABI cannot be called
.note = an `extern "gpu-kernel"` function can only be launched on the GPU through an API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "can" is right? Maybe "should"? And "an API" is somewhat vague, what kind of API? Surely not a standard library one, for instance...

Suggested change
.note = an `extern "gpu-kernel"` function can only be launched on the GPU through an API
.note = an `extern "gpu-kernel"` function should only be launched on the GPU through a driver's API

Or "must"?

...I don't know if "kernel loader" is a real term for the thing that the GPU driver "does", it just came up intuitively while trying to grasp for words for the thing.

Suggested change
.note = an `extern "gpu-kernel"` function can only be launched on the GPU through an API
.note = an `extern "gpu-kernel"` function must only be launched on the GPU by the kernel loader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s vague because there’s so many APIs 😄
A gpu-kernel must be loaded and launched through cuda/hip/sycl/hsa or any wrapper around these APIs.
Driver is often used in the graphics space, but I don’t read it often when it comes to GPGPU.
The kernel loader is also part of cuda/hip/…, but not exactly the piece that launches (as in, for launching a process on the CPU, one doesn’t call ld, but the kernel, even though ld is involved).

The must is definitely better, I’ll also remove the only.

I think the distinctive part is, only the (GPU) hardware is allowed to start gpu-kernels, they must not be called (from anywhere or anyone).
I guess it’s similar to _start on CPUs?
Note: Launching a kernel is not really a call, it’s 1-to-n – launching n instances of the kernel – and it’s async/doesn’t return.
Maybe must be launched on the GPU by the hardware is better?

(Technically, one can call the _start function on a CPU, similarly one can “call” a gpu-kernel on the GPU, it’s just an address pointing to instructions, but one cannot use the gpu-kernel ABI in the call instruction.)

/// This lint is issued when it detects a probable mistake in a signature.
IMPROPER_GPU_KERNEL_ARG,
Warn,
"simple arguments of gpu-kernel functions"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should capture the reason for the lint, not what it is checking for, so something like

Suggested change
"simple arguments of gpu-kernel functions"
"GPU kernel entry points have a limited calling convention"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I’ll call it ABI instead of calling convention, it seems like Rust calls it ABI in most places. (Unless there’s a difference I’m missing)

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 16, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 16, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

Copy link
Contributor Author

@Flakebi Flakebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Some context on the internal workings:

  1. On the CPU side, a program passes arguments to a kernel
  2. The “API” takes these arguments and writes them into GPU memory
  3. The kernel on the GPU gets a pointer to this memory
  4. When the kernel accesses arguments, it reads from this memory

(I think nvidia and amd work the same here. I’m not too familiar with nvidia, but this seems to suggest so: https://github.com/Rust-GPU/rust-cuda/blob/44c44baf6fb738d5ffec25aac5db8af02514e890/crates/rustc_codegen_nvvm/src/abi.rs#L60)

So, number of arguments or size of arguments doesn’t really matter, it’s all memory anyways.
And, we could make struct arguments work (maybe, I didn’t look into the details), but Rust would need to take them by value, currently it changes them to pass by pointer.

View changes since this review

hir_typeck_gpu_kernel_abi_cannot_be_called =
functions with the "gpu-kernel" ABI cannot be called
.note = an `extern "gpu-kernel"` function can only be launched on the GPU through an API
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s vague because there’s so many APIs 😄
A gpu-kernel must be loaded and launched through cuda/hip/sycl/hsa or any wrapper around these APIs.
Driver is often used in the graphics space, but I don’t read it often when it comes to GPGPU.
The kernel loader is also part of cuda/hip/…, but not exactly the piece that launches (as in, for launching a process on the CPU, one doesn’t call ld, but the kernel, even though ld is involved).

The must is definitely better, I’ll also remove the only.

I think the distinctive part is, only the (GPU) hardware is allowed to start gpu-kernels, they must not be called (from anywhere or anyone).
I guess it’s similar to _start on CPUs?
Note: Launching a kernel is not really a call, it’s 1-to-n – launching n instances of the kernel – and it’s async/doesn’t return.
Maybe must be launched on the GPU by the hardware is better?

(Technically, one can call the _start function on a CPU, similarly one can “call” a gpu-kernel on the GPU, it’s just an address pointing to instructions, but one cannot use the gpu-kernel ABI in the call instruction.)

functional record update syntax requires a struct
hir_typeck_gpu_kernel_abi_cannot_be_called =
functions with the "gpu-kernel" ABI cannot be called
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn’t be called from C or the CPU either ;)
See also my other comment, I think nobody can call a gpu-kernel, it can only be “launched” (which is a different thing!).

/// This lint is issued when it detects a probable mistake in a signature.
IMPROPER_GPU_KERNEL_ARG,
Warn,
"simple arguments of gpu-kernel functions"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I’ll call it ABI instead of calling convention, it seems like Rust calls it ABI in most places. (Unless there’s a difference I’m missing)

@rust-log-analyzer

This comment has been minimized.

The `gpu-kernel` calling convention has several restrictions that were
not enforced by the compiler until now.
Add the following restrictions:

1. Cannot be async
2. Cannot be called
3. Cannot return values, return type must be `()` or `!`
4. Arguments should be primitives, i.e. passed by value. More complicated
   types can work when you know what you are doing, but it is rather
   unintuitive, one needs to know ABI/compiler internals.
5. Export name should be unmangled, either through `no_mangle` or
   `export_name`. Kernels are searched by name on the CPU side, having
   a mangled name makes it hard to find and probably almost always
   unintentional.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants