issues with test of CUDA-aware MPI support

I am currently trying to test CUDA-aware MPI on [this](https://documentation.sigma2.no/hpc_machines/olivia.html) machine, using Nvidia GH200s.

I first tried the alltoall_test_cuda.jl test recommended [here](https://juliaparallel.org/MPI.jl/stable/usage/) which works fine, but when I move on to testing the functioning of multiple GPUs with alltoall_test_cuda_multigpu.jl, I end up with the following issue:

```
rank=0 rank_loc=0 (gpu_id=CuDevice(0)), size=4, dst=1, src=3
ERROR: LoadError: CUDA error: invalid device ordinal (code 101, ERROR_INVALID_DEVICE)
ERROR: LoadError: CUDA error: invalid device ordinal (code 101, ERROR_INVALID_DEVICE)
ERROR: LoadError: CUDA error: invalid device ordinal (code 101, ERROR_INVALID_DEVICE)
Stacktrace:
Stacktrace:
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/libcuda.jl:30
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/libcuda.jl:30
 [2] check
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/libcuda.jl:37 [inlined]
 [3] cuDeviceGet
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/GPUToolbox/JLBB1/src/ccalls.jl:33 [inlined]
 [1] throw_api_error(res::CUDA.cudaError_enum)
 [2] check
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/libcuda.jl:37 [inlined]
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/libcuda.jl:30
 [3] cuDeviceGet
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/GPUToolbox/JLBB1/src/ccalls.jl:33 [inlined]
 [2] check
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/libcuda.jl:37 [inlined]
 [3] cuDeviceGet
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/GPUToolbox/JLBB1/src/ccalls.jl:33 [inlined]
 [4] CuDevice(ordinal::Int64)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/devices.jl:17
 [5] device!
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/state.jl:324 [inlined]
 [6] device!(dev::Int64)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/state.jl:324
 [7] top-level scope
   @ ~/alltoall_test_cuda_multigpu.jl:9
in expression starting at /cluster/home/aklocker/alltoall_test_cuda_multigpu.jl:9
 [4] CuDevice(ordinal::Int64)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/devices.jl:17
 [5] device!
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/state.jl:324 [inlined]
 [6] device!(dev::Int64)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/state.jl:324
 [7] top-level scope
   @ ~/alltoall_test_cuda_multigpu.jl:9
in expression starting at /cluster/home/aklocker/alltoall_test_cuda_multigpu.jl:9
 [4] CuDevice(ordinal::Int64)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/devices.jl:17
 [5] device!
   @ /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/state.jl:324 [inlined]
 [6] device!(dev::Int64)
   @ CUDA /cluster/projects/nn9874k/aklocker/juliaup/depot/packages/CUDA/x8d2s/lib/cudadrv/state.jl:324
 [7] top-level scope
   @ ~/alltoall_test_cuda_multigpu.jl:9
in expression starting at /cluster/home/aklocker/alltoall_test_cuda_multigpu.jl:9
srun: error: gpu-1-1: tasks 1-3: Exited with exit code 1
srun: Terminating StepId=58810.0
[2025-12-11T09:52:48.778] error: *** STEP 58810.0 ON gpu-1-1 CANCELLED AT 2025-12-11T09:52:48 DUE TO TASK FAILURE ***
```
My system info:
```
julia> CUDA.versioninfo()
CUDA toolchain: 
- runtime 12.6, local installation
- driver 565.57.1 for 13.0
- compiler 12.9

CUDA libraries: 
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 12.6.0)
- NVML: 12.0.0+565.57.1

Julia packages: 
M- CUDA: 5.9.5
- CUDA_Driver_jll: 13.0.2+0
- CUDA_Compiler_jll: 0.3.0+0
- CUDA_Runtime_jll: 0.19.2+0
- CUDA_Runtime_Discovery: 1.0.0

Toolchain:
- Julia: 1.10.10
- LLVM: 15.0.7

Environment:
- JULIA_CUDA_MEMORY_POOL: none
- JULIA_CUDA_USE_BINARYBUILDER: false

Preferences:
- CUDA_Runtime_jll.local: true

2 devices:
  0: NVIDIA GH200 120GB (sm_90, 94.997 GiB / 95.577 GiB available)
  1: NVIDIA GH200 120GB (sm_90, 94.997 GiB / 95.577 GiB available)
```
and

```
julia> MPI.versioninfo()
MPIPreferences:
  binary:  system
  abi:     MPICH
  libmpi:  libmpi_cray.so
  mpiexec: ["srun", "-C", "gpu"]

Package versions
  MPI.jl:             0.20.23
  MPIPreferences.jl:  0.1.11

Library information:
  libmpi:  libmpi_cray.so
  libmpi dlpath:  /opt/cray/pe/lib64/libmpi_cray.so
  MPI version:  3.1.0
  Library version:  
    MPI VERSION    : CRAY MPICH version 8.1.32.110 (ANL base 3.4a2)
    MPI BUILD INFO : Thu Feb 06 22:42 2025 (git hash f9c5634-dirty)
    
  MPI launcher: srun
  MPI launcher path: /usr/bin/srun
```
Could anyone here point me in the right direction of what is going wrong here? I'm relatively new to Julia and CUDA, so any help would be much appreciated!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

issues with test of CUDA-aware MPI support #924

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

issues with test of CUDA-aware MPI support #924

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions