Skip to content

Conversation

@luraess
Copy link
Contributor

@luraess luraess commented Dec 17, 2025

Adds infos to the doc as per discussion in #924.

@luraess luraess changed the title Improve GPU-aware section Improve GPU-aware section in the docs Dec 17, 2025
> comm_loc = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank)
> rank_loc = MPI.Comm_rank(comm_loc)
> ```
> If using (2), one can use the default device but make sur to handle device visbility in the scheduler; for SLURM on Cray systems, this can be mostly achieved using `--gpus-per-task=1`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't '--gpus-per-task' for SLURM prevent the use of GPU Peer2Peer IPC mechanisms (https://cpe.ext.hpe.com/docs/24.03/mpt/mpich/intro_mpi.html) which would have a negative impact on performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I also remember, but perhaps Nvidia has finally fixed this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I can make the text more generic

Comment on lines 85 to 87
Successfully running the [alltoall\_test\_cuda.jl](https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2)
should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the
[alltoall\_test\_cuda\_multigpu.jl](https://gist.github.com/luraess/ed93cc09ba04fe16f63b4219c1811566) should confirm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the files into this repository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were shall one put them?

Comment on lines 103 to 108
!!! note "Preloads"
On Cray machines, you may need to ensure the following preloads to be set in the preferences:
```
preloads = ["libmpi_gtl_hsa.so"]
preloads_env_switch = "MPICH_GPU_SUPPORT_ENABLED"
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also true for CUDA.

preloads_env_switch = "MPICH_GPU_SUPPORT_ENABLED"
```

!!! note "Multiple GPUs per node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!!! note "Multiple GPUs per node"
### "Multiple GPUs per node"

Since the text is not just on ROCM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants