This repository contains GPU and CPU implementations of sparse matrix iterative solvers with ILU0 and ILUK preconditioning.
This project provides three main executable targets:
- ilu0_gpu - GPU-accelerated ILU0 preconditioned conjugate gradient solver
- iluk_gpu - GPU-accelerated ILUK preconditioned conjugate gradient solver
- ilu0_cpu - CPU-based ILU0 preconditioned conjugate gradient solver
For GPU targets:
- CUDA Toolkit (version 11.0 or later)
- CMake 3.20 or later
- C++17 compatible compiler
- Git
For CPU target:
- OpenMP
- Eigen3 library
- CMake 3.0 or later
- C++17 compatible compiler
For matrix preparation:
- Python 3.x
- Required Python packages:
pip install numpy scipy pandas matplotlib ssgetpy
On Ubuntu/Debian:
# Install basic development tools
sudo apt update
sudo apt install cmake build-essential git
# Install Eigen3 and OpenMP
sudo apt install libeigen3-dev libomp-dev
# Install CUDA Toolkit (follow NVIDIA's official guide)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu20.04/x86_64/cuda-ubuntu20.04.pin
sudo mv cuda-ubuntu20.04.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu20.04-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu20.04-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu20.04-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install cudaOn CentOS/RHEL:
# Install development tools
sudo yum groupinstall "Development Tools"
sudo yum install cmake3 eigen3-devel
# Install CUDA (follow NVIDIA's official guide)
# Download and install CUDA toolkit from NVIDIA websiteBefore building and running the targets, you need to prepare the matrix data:
Create a directory structure and download matrices from the SuiteSparse Matrix Collection:
# Create matrices directory
mkdir -p matrices
# Download matrices using the provided script
cd script_src/python_scripts/prep
python3 matrix_download.py
cd ../../..This will download approximately 100+ test matrices from the SuiteSparse collection into the matrices/ directory.
Generate sparsified versions of the matrices for testing different sparsification ratios:
# Generate sparsified matrices (may take several hours)
cd script_src/matlab_scripts
matlab -batch "matrix_sparsification"
cd ../..Compute various matrix properties needed for the algorithms:
# Compute matrix properties
cd script_src/matlab_scripts
matlab -batch "matrix_sparsification_analysis"
cd ../..For the ILUK GPU target, you need to pre-generate factorization data:
# Create factors directory
mkdir -p factors/timing
# Generate ILUK factorization data
cd script_src/python_scripts/prep/iluk_factorization
python3 iluk_factorize.py ../../../matrices/1138_bus/1138_bus.mtx --export
cd ../../../..Repeat this process for all matrices you want to test with ILUK.
The ILU0 GPU implementation has two variants: non-sparsified (nonsp) and sparsified (sp).
Build non-sparsified version:
cd gpu_src/ilu0_gpu/nonsp
mkdir -p build
cd build
cmake ..
make -j$(nproc)
cd ../../../..Build sparsified version:
cd gpu_src/ilu0_gpu/sp
mkdir -p build
cd build
cmake ..
make -j$(nproc)
cd ../../../..cd gpu_src/iluk_gpu
mkdir -p build
cd build
cmake ..
make -j$(nproc)
cd ../../..cd cpu_src
mkdir -p build
cd build
cmake ..
make -j$(nproc)
cd ../..cd gpu_src/ilu0_gpu/nonsp/build
./conjugateGradientPrecond ../../../../matrices/1138_bus/1138_bus.mtxcd gpu_src/ilu0_gpu/sp/build
./conjugateGradientPrecond ../../../../matrices/1138_bus/1138_bus.mtx ../../../../matrices/1138_bus/1138_bus.mtx <sparsification ratio>The can be 0.01, 0.05, 0.1 etc., as long as it aligns with what the matrix_sparsification.m script produces.
cd gpu_src/iluk_gpu/build
./conjugateGradientPrecond ../../../../matrices/1138_bus/1138_bus.mtx <path to lower factor> <path to upper factor>cd cpu_src/build
./conjugateGradientPrecond ../../matrices/1138_bus/1138_bus.mtx ../../matrices/1138_bus/1138_bus_sp_0.05.mtx- Set appropriate OpenMP thread count:
export OMP_NUM_THREADS=8 - For optimal performance, pin threads to specific cores:
export OMP_PLACES=cores export OMP_PROC_BIND=close
-
Matrix file not found:
- Verify matrix paths are correct relative to executable location
- Ensure matrices were downloaded properly
-
GPU memory errors:
- Check available GPU memory:
nvidia-smi - Try smaller matrices first
- Reduce batch size
- Check available GPU memory:
-
Performance issues:
- Verify OpenMP is properly linked for CPU version
- Check CUDA installation for GPU versions
- Monitor system resources during execution