
SHAFFT is a scalable library for high-dimensional complex-to-complex Fast Fourier Transforms (FFTs) in distributed-memory environments. It implements the slab decomposition method introduced by Dalcin, Mortensen, and Keyes (arXiv:1804.09536), using MPI for communication across a Cartesian process topology.
Features
- N-dimensional distributed FFTs with slab decomposition
- 1D distributed FFTs via dedicated FFT1D class
- Flexible process grid topology (1 to N-1 distributed axes)
- Single and double precision complex transforms
- Backend-agnostic design with portable buffer API
- C++, C, and Fortran 2003 interfaces
Supported Backends
| Backend | Target | Description |
| hipFFT | GPU | AMD and NVIDIA GPUs via ROCm/HIP |
| FFTW | CPU | Multi-threaded CPU execution |
Quick Start
Requirements
- CMake >= 3.21
- MPI implementation
- C++17-compatible compiler (GCC >= 10)
- Backend: FFTW3 (CPU) or ROCm/HIP (GPU)
Build
# FFTW backend (CPU)
cmake -B build -S . \
-DSHAFFT_ENABLE_FFTW=ON \
-DCMAKE_INSTALL_PREFIX=/opt/shafft
cmake --build build --target install
# hipFFT backend (GPU)
cmake -B build -S . \
-DSHAFFT_ENABLE_HIPFFT=ON \
-DSHAFFT_GPU_AWARE_MPI=OFF \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_INSTALL_PREFIX=/opt/shafft
cmake --build build --target install
Example
#include <shafft/shafft.hpp>
#include <mpi.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
std::vector<int> commDims = {0, 0, 0};
std::vector<size_t> dims = {64, 64, 32};
MPI_Finalize();
return 0;
}
N-dimensional distributed FFT plan with RAII semantics.
Definition shafft.hpp:51
int init(const std::vector< int > &commDims, const std::vector< size_t > &dimensions, FFTType type, MPI_Comm comm, TransformLayout output=TransformLayout::REDISTRIBUTED) noexcept
Initialize plan with Cartesian process grid.
int normalize() noexcept override
Apply symmetric normalization (1/sqrt(N) per transform).
int plan() noexcept override
Create backend FFT plans.
int execute(FFTDirection direction) noexcept override
Execute the FFT.
size_t allocSize() const noexcept override
Get required buffer size in complex elements.
void release() noexcept override
Release all internal resources.
int setBuffers(complexf *data, complexf *work) noexcept
Attach data and work buffers.
int freeBuffer(complexf *buf) noexcept
Free buffer allocated with allocBuffer().
int allocBuffer(size_t count, complexf **buf) noexcept
Allocate buffer for the current backend.
std::complex< float > complexf
Single-precision complex type (std::complex<float>).
Definition shafft_types.hpp:71
@ C2C
Single-precision complex-to-complex (float).
@ BACKWARD
Backward/inverse transform (frequency to time domain).
@ FORWARD
Forward transform (time to frequency domain).
Validation (Optional)
cd build
ctest --output-on-failure
Documentation
Primary documentation:
- Getting Started - installation and build options
- User Guide - API usage for C++, C, and Fortran
- Linking Guide - compile and link instructions
- Backend Reference - backend-specific configuration
- Limitations - current constraints
Build local HTML documentation (optional):
Then open docs/html/index.html in your browser.
License
MIT License. See LICENSE for details.
Citation
If you use SHAFFT in your research, please cite the underlying method:
@article{dalcin2019fast,
title={Fast parallel multidimensional FFT using advanced MPI},
author={Dalcin, Lisandro and Mortensen, Mikael and Keyes, David E},
journal={Journal of Parallel and Distributed Computing},
volume={128},
pages={137--150},
year={2019},
doi={10.1016/j.jpdc.2019.02.006}
}