Microbenchmarking OpenMP target offload with Catch2

Build instructions for various machines

Currently for OpenMP target offload we need to edit the flag --offload-arch in CMakeLists.txt or supply it in CMAKE_CXX_FLAGS according to the targeted GPU architecture

After upgrading to Catch2 v3.x, specify path to Catch2 via -DCatch_Root=/path/to/Catch2, otherwise CMake will add it as a dependency

NERSC Perlmutter

Set --offload-arch=sm_80 for A100

module load llvm/16 cudatoolkit/11.7 cmake/3.24.3

cmake -S . -B build-clang16-perlmutter -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_CXX_FLAGS="--offload-arch=sm_80 -O3 -mtune=native -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -lcudart -lcudart_static -ldl -lrt -pthreads"

cmake --build build-clang16-perlmutter --parallel 16 --verbose

OLCF Frontier

Set --offload-arch=gfx90a for Mi250X

module load rocm/5.4.3 cmake craype-accel-amd-gfx90a

cmake -S . -B build-clang15-frontier -DCMAKE_C_COMPILER=amdclang -DCMAKE_CXX_COMPILER=amdclang++ -DCMAKE_CXX_FLAGS="-O3 -mtune=native --offload-arch=gfx90a " -DCMAKE_PREFIX_PATH=""

cmake --build build-clang15-frontier --parallel 16 --verbose

BNL CSI HPC Dahlia

/home/atif/packages/cmake-3.30.0-rc2-linux-x86_64/bin/cmake -B build-dahlia/ -S . -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCatch2_ROOT=/home/atif/openmp-benchmarks/Catch22 -DCMAKE_CXX_FLAGS="--offload-arch=sm_86"

/home/atif/packages/cmake-3.30.0-rc2-linux-x86_64/bin/cmake --build build-dahlia/ --parallel 16

./build-dahlia/saxpy/saxpy_omp_app --benchmark-samples 1000 --benchmark-resamples 100 --benchmark-confidence-interval 0.95 --input-file inp-omp --benchmark-warmup-time 10 -r tabular

BNL CSI HPC Peony

/home/atif/packages/cmake-3.30.0-rc2-linux-x86_64/bin/cmake -B build-peony/ -S . -DCatch2_ROOT=/home/atif/openmp-benchmarks/Catch22 -DCMAKE_CXX_FLAGS="--offload-arch=gfx90a"

/home/atif/packages/cmake-3.30.0-rc2-linux-x86_64/bin/cmake --build build-peony/ --parallel 16

./build-peony/saxpy/saxpy_omp_app --benchmark-samples 1000 --benchmark-resamples 100 --benchmark-confidence-interval 0.95 --input-file inp-omp --benchmark-warmup-time 10 -r tabular

BNL Institutional Cluster

Set --offload-arch=sm_37 for K80

Set --offload-arch=sm_60 for P100

module load git/2.11.1 cmake/3.23.1 llvm/13.0.1

cmake -S . -B build -DCMAKE_INSTALL_PREFIX=install -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O3"

cmake --build build --parallel 8

BNL CSI alpha1/lambda1/lambda2/lambda3/lambda4

Set --offload-arch=sm_80 for alpha1 A30

Set --offload-arch=sm_70 for lambda1 V100

Set --offload-arch=sm_86 for lambda2 A6000

Set --offload-arch=gfx906 for lambda2 Vega20

Set --offload-arch=gfx906 for lambda3

Set --offload-arch=sm_75 for lambda4 2080Ti

module use /work/software/modulefiles

module load nvhpc/22.9

export PATH=/work/software/wc/llvm-16-test/bin/:$PATH

export LD_LIBRARY_PATH=/work/software/wc/llvm-16-test/lib/:$LD_LIBRARY_PATH

/work/atif/packages/cmake-3.25.0-linux-x86_64/bin/cmake -S . -B build -DCMAKE_INSTALL_PREFIX=install -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O3 --offload-arch=sm_86 -mtune=native" -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_PREFIX_PATH=""

/work/atif/packages/cmake-3.25.0-linux-x86_64/bin/cmake --build build --parallel 8

Running a microbenchmark

./build/saxpy/saxpy_omp_app --benchmark-samples 1000 --benchmark-resamples 100 --benchmark-confidence-interval 0.95 --input-file inputfile --benchmark-warmup-time 10

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
array_init		array_init
atomic_capture		atomic_capture
atomic_update		atomic_update
gemm		gemm
include		include
reduction		reduction
saxpy		saxpy
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
inputfile		inputfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Microbenchmarking OpenMP target offload with Catch2

Build instructions for various machines

NERSC Perlmutter

OLCF Frontier

BNL CSI HPC Dahlia

BNL CSI HPC Peony

BNL Institutional Cluster

BNL CSI alpha1/lambda1/lambda2/lambda3/lambda4

Running a microbenchmark

About

Uh oh!

Releases

Packages

Languages

BNL-HPC/openmp-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Microbenchmarking OpenMP target offload with Catch2

Build instructions for various machines

NERSC Perlmutter

OLCF Frontier

BNL CSI HPC Dahlia

BNL CSI HPC Peony

BNL Institutional Cluster

BNL CSI alpha1/lambda1/lambda2/lambda3/lambda4

Running a microbenchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages