A hybrid MPI+OpenMP parallel sparse solver benchmark collection for evaluating the performance of iterative sparse linear solvers and sparse matrix-vector operations.
SparseBench provides a comprehensive benchmarking suite for sparse matrix computations with support for multiple matrix formats and iterative solvers. It is designed to measure performance on both single-node and distributed memory systems.
SparseBench supports three different sparse matrix storage formats, each with different performance characteristics:
- CRS (Compressed Row Storage): Standard row-compressed format with three arrays (row pointers, column indices, values). Good general-purpose format.
- SCS (Sell-C-Sigma): SELL-C-σ format optimized for vectorization and cache efficiency. Particularly effective on modern CPUs with SIMD instructions.
- CCRS (Compressed CRS): Compressed variant of CRS format for improved memory efficiency.
The following benchmark types are available:
- CG: Conjugate Gradient iterative solver for symmetric positive definite systems.
- SPMV: Sparse Matrix-Vector Multiplication (SpMV) kernel benchmark.
- GMRES: Generalized Minimal Residual method for general sparse systems (planned).
- CHEBFD: Chebyshev Filter Diagonalization (planned).
For distributed memory execution with MPI, SparseBench uses an optimized communication algorithm that reduces communication to a single exchange operation per iteration. See MPI-Algorithm.md for a detailed explanation of the localization process and communication strategy.
To build and run SparseBench, you need a C compiler and GNU make. MPI is optional but required for distributed memory benchmarks.
-
Install a supported compiler
- GCC
- Clang
- Intel ICX
-
Clone the repository
git clone <repository-url> cd SparseBench
-
(Optional) Adjust configuration
On first run,
makewill copymk/config-default.mktoconfig.mk. Editconfig.mkto change the toolchain, matrix format, enable MPI/OpenMP, etc. -
Build
make
See the full Build section for more details.
-
Usage
./sparseBench-<TOOLCHAIN> [options]
See the full Usage section for more details.
Get help on command-line arguments:
./sparseBench-<TOOLCHAIN> -h
Configure the toolchain and additional options in config.mk:
# Supported: GCC, CLANG, ICC
TOOLCHAIN ?= CLANG
# Supported: CRS, SCS, CCRS
MTX_FMT ?= CRS
ENABLE_MPI ?= true
ENABLE_OPENMP ?= false
FLOAT_TYPE ?= DP # SP for float, DP for double
UINT_TYPE ?= U # U for unsigned int, ULL for unsigned long long int
# Feature options
OPTIONS += -DARRAY_ALIGNMENT=64
OPTIONS += -DOMP_SCHEDULE=static
#OPTIONS += -DVERBOSE
#OPTIONS += -DVERBOSE_AFFINITY
#OPTIONS += -DVERBOSE_DATASIZE
#OPTIONS += -DVERBOSE_TIMER- TOOLCHAIN: Compiler to use (GCC, CLANG, ICC)
- MTX_FMT: Sparse matrix storage format (CRS, SCS, CCRS)
- ENABLE_MPI: Enable MPI for distributed memory execution
- ENABLE_OPENMP: Enable OpenMP for shared memory parallelism
- FLOAT_TYPE:
SP: Single precision (float)DP: Double precision (double)
- UINT_TYPE:
U: Unsigned int for matrix indicesULL: Unsigned long long int for very large matrices
The verbosity options enable detailed diagnostic output:
-DVERBOSE: General verbose output-DVERBOSE_AFFINITY: Print thread affinity settings and processor bindings-DVERBOSE_DATASIZE: Print detailed memory allocation sizes-DVERBOSE_TIMER: Print timer resolution information
Build with:
makeYou can build multiple toolchains in the same directory, but the Makefile
only acts on the currently configured one. Intermediate build results are
located in the ./build/<TOOLCHAIN> directory.
To show all executed commands use:
make Q=Clean up intermediate build results for the current toolchain with:
make cleanClean up all build results for all toolchains with:
make distcleanGenerate assembler files:
make asmThe assembler files will be located in the ./build/<TOOLCHAIN> directory.
Reformat all source files using clang-format (only works if clang-format
is in your path):
make formatTo run the benchmark, call:
./sparseBench-<TOOLCHAIN> [options]| Option | Argument | Description |
|---|---|---|
-h |
— | Show help text. |
-f |
<parameter file> |
Load options from a parameter file. |
-m |
<MM matrix> |
Load a Matrix Market (.mtx) file. |
-c |
<file name> |
Convert a Matrix Market file to binary matrix format (.bmx). |
-t |
<bench type> |
Benchmark type: cg, spmv, or gmres. Default: cg. |
-x |
<int> |
Size in x dimension for generated matrix (ignored if loading file). Default: 100. |
-y |
<int> |
Size in y dimension for generated matrix (ignored if loading file). Default: 100. |
-z |
<int> |
Size in z dimension for generated matrix (ignored if loading file). Default: 100. |
-i |
<int> |
Number of solver iterations. Default: 150. |
-e |
<float> |
Convergence criteria epsilon. Default: 0.0. |
SparseBench supports multiple ways to provide input matrices:
-
Generated matrices: Use default or specify dimensions with
-x,-y,-z:- Default mode generates a 3D 7-point stencil matrix
- Use
-m generate7Pfor explicit 7-point stencil generation
-
Matrix Market files (
.mtx): Load standard MatrixMarket format:./sparseBench-GCC -m matrix.mtx
-
Binary matrix files (
.bmx): Load pre-converted binary format (MPI builds only):./sparseBench-GCC -m matrix.bmx
Convert Matrix Market to binary format:
./sparseBench-GCC -c matrix.mtx
Run CG solver with generated 100×100×100 matrix:
./sparseBench-GCCRun CG solver with custom dimensions:
./sparseBench-GCC -x 200 -y 200 -z 200Run SpMV benchmark with Matrix Market file:
./sparseBench-GCC -t spmv -m matrix.mtx -i 1000Run GMRES solver with convergence criteria:
./sparseBench-GCC -t gmres -m matrix.mtx -e 1e-6Run with MPI (4 processes):
mpirun -np 4 ./sparseBench-GCC -m matrix.mtxSparseBench uses an optimized MPI communication algorithm for distributed sparse matrix computations. The implementation uses a localization process that transforms the distributed matrix to enable efficient communication:
- All column indices are converted to local indices
- External elements are appended to local vectors
- Communication is reduced to a single
MPI_Neighbor_alltoallvcall per iteration - Elements from the same source rank are stored consecutively
For a detailed explanation of the algorithm, including the four-step localization process (identify externals, build graph topology, reorder externals, build global index list), see MPI-Algorithm.md.
The tests directory contains unit tests for various components:
- Matrix format tests
- Solver validation tests
- Communication tests (MPI builds)
To run all tests:
cd tests
make
./runTestsFor OpenMP execution, it is recommended to control thread affinity for optimal
performance. We recommend using likwid-pin to set the number of threads and
control thread affinity:
likwid-pin -C 0-7 ./sparseBench-GCCIf likwid-pin is not available, you can use standard OpenMP environment
variables:
export OMP_NUM_THREADS=8
export OMP_PROC_BIND=close
export OMP_PLACES=cores
./sparseBench-GCCThe Makefile will generate a .clangd configuration to correctly set all
options for the clang language server. This is only important if you use an
editor with LSP support and want to edit or explore the source code.
It is required to use GNU Make 4.0 or newer. While older make versions will
work, the generation of the .clangd configuration for the clang language
server will not work. The default Make version included in MacOS is 3.81!
Newer make versions can be easily installed on MacOS using the
Homebrew package manager.
An alternative is to use Bear, a tool that generates a compilation database for clang tooling. This method also enables jumping to any definition without previously opened buffer. You have to build SparseBench one time with Bear as a wrapper:
bear -- makeCopyright © NHR@FAU, University Erlangen-Nuremberg.
This project is licensed under the MIT License - see the LICENSE file for details.