This document provides instructions for building the AOCL-DLP library from source code.
Before building AOCL-DLP, ensure your system meets the following requirements:
- CMake (≥ 3.26)
- C/C++ compiler with C11/C++17 support (e.g., GCC 11+, Clang 14+)
- OpenMP library (for multi-threading)
- ninja-build (optional, for Ninja generator support)
Note: GCC 11 introduced AVX512_BF16 support, which is required for bfloat16 GEMM.
- x86 CPU with AVX2/FMA3 support
- AVX512 support for enhanced performance
- AVX512_VNNI support for int8 GEMM
- AVX512_BF16 support for bfloat16 GEMM
AOCL-DLP uses CMake for its build system with several configurable options:
| Option | Default | Description |
|---|---|---|
| General Build Options | ||
| BUILD_EXAMPLES | OFF | Build example programs |
| BUILD_BENCHMARKS | OFF | Build benchmark programs |
| BUILD_TESTING | OFF | Build test programs (requires DLP_TESTING_CTEST_DISABLED=OFF for CTest) |
| BUILD_DOXYGEN | OFF | Build Doxygen documentation |
| BUILD_SPHINX | OFF | Build Sphinx documentation |
| CMAKE_EXPORT_COMPILE_COMMANDS | OFF | Generate compile_commands.json for tooling |
| CMAKE_BUILD_TYPE | Release | Build type ("Release", "Debug", "RelWithDebInfo", "Coverage") |
| CMAKE_INSTALL_PREFIX | /usr/local | Installation directory |
| Compiler Options | ||
| CMAKE_CXX_COMPILER | system | Specify C++ compiler (e.g., g++) |
| CMAKE_C_COMPILER | system | Specify C compiler (e.g., gcc) |
| Threading & Sanitizers | ||
| DLP_THREADING_MODEL | "none" | Threading model ("none", "openmp") |
| DLP_ENABLE_OPENMP | ON | Override OpenMP support (auto-enabled by threading model) |
| DLP_OPENMP_ROOT | "" | Custom path to OpenMP installation |
| DLP_USE_LLVM_OPENMP | OFF | Force using LLVM OpenMP implementation |
| DLP_ENABLE_ASAN | OFF | Enable AddressSanitizer |
| DLP_ENABLE_TSAN | OFF | Enable ThreadSanitizer |
| DLP_ENABLE_UBSAN | OFF | Enable UndefinedBehaviorSanitizer |
| DLP_TESTING_CTEST_DISABLED | ON | Disable CTest integration (set to OFF to enable with BUILD_TESTING) |
| Testing Options | ||
| DLP_TESTING_LINK_STATIC | ON | Link tests with static AOCL-DLP library for better performance |
| DLP_TESTING_ENABLE_DETAILED_DEBUG | OFF | Enable detailed debug information for tests |
| Benchmarking Options | ||
| DLP_BENCHMARKS_LINK_STATIC | ON | Link benchmarks with static AOCL-DLP library for better performance |
| Build Target Options | ||
| DLP_EXAMPLES_LINK_STATIC | ON | Link examples with static AOCL-DLP library for better performance |
| Kernel Dispatch Table | ||
| DLP_KDT_TABLE_SIZE | 16 | Set table size for the Kernel Dispatch Table |
| DLP_KDT_CHAIN_SIZE | 128 | Set table chain size for the Kernel Dispatch Table |
Note:
- Options can be set via
-D<option>=<value>when invokingcmake. - Some options (like
-GNinja) are passed as command-line arguments, not as variables. - For a full list of options, see the modular cmake files:
cmake/dlp_core_options.cmake,cmake/dlp_testing.cmake,cmake/dlp_benchmark.cmake,cmake/dlp_build_options.cmake, andcmake/dlp_documentation.cmake.
-
Clone and enter project:
git clone <repository-url> && cd aocl-dlp
-
Create an out-of-source build directory:
mkdir -p build && cd build
-
Configure (choose generator):
# Default (GNU Make) cmake .. # Ninja (fast incremental builds) cmake -G Ninja ..
-
Build:
# Make make -j$(nproc) # Ninja ninja
-
For installation instructions, see INSTALL.md.
To enable benchmarks:
cmake -DBUILD_BENCHMARKS=ON ..To enable testing with full CTest integration:
cmake -DBUILD_TESTING=ON -DDLP_TESTING_CTEST_DISABLED=OFF ..Note: Both BUILD_TESTING=ON and DLP_TESTING_CTEST_DISABLED=OFF are required for full CTest integration. Using only BUILD_TESTING=ON builds tests but uses traditional testing instead of Google Test discovery.
AOCL-DLP supports the following threading models:
# No threading (default)
cmake -DDLP_THREADING_MODEL=none ..
# Enable OpenMP threading
cmake -DDLP_THREADING_MODEL=openmp ..Note: Setting DLP_THREADING_MODEL=openmp automatically enables OpenMP support. The separate DLP_ENABLE_OPENMP option (default: ON) provides additional control and can disable OpenMP entirely with -DDLP_ENABLE_OPENMP=OFF.
For custom OpenMP installation:
cmake -DDLP_THREADING_MODEL=openmp -DDLP_OPENMP_ROOT=/path/to/openmp ..By default, tests, benchmarks, and examples are linked with the static AOCL-DLP library for better performance. You can control this behavior:
Enable static linking (default):
cmake -DBUILD_EXAMPLES=ON -DDLP_EXAMPLES_LINK_STATIC=ON ..
cmake -DBUILD_TESTING=ON -DDLP_TESTING_LINK_STATIC=ON ..
cmake -DBUILD_BENCHMARKS=ON -DDLP_BENCHMARKS_LINK_STATIC=ON ..Use dynamic linking:
cmake -DBUILD_EXAMPLES=ON -DDLP_EXAMPLES_LINK_STATIC=OFF ..
cmake -DBUILD_TESTING=ON -DDLP_TESTING_LINK_STATIC=OFF ..
cmake -DBUILD_BENCHMARKS=ON -DDLP_BENCHMARKS_LINK_STATIC=OFF ..Verify linking with ldd:
# Static linking - no libaocl-dlp.so should appear
ldd examples/classic/simple_gemm_f32
# Dynamic linking - libaocl-dlp.so should appear
ldd examples/classic/simple_gemm_f32Note: Static linking provides better performance by eliminating dynamic library loading overhead, which is especially beneficial for benchmarks and performance testing.
You can specify different build types:
# Debug build
cmake -DCMAKE_BUILD_TYPE=Debug ..
# Release build (default)
cmake -DCMAKE_BUILD_TYPE=Release ..
# Release with debug info
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
# Coverage build (for code coverage analysis)
cmake -DCMAKE_BUILD_TYPE=Coverage ..By default, the Kernel Dispatch Table (KDT) is configured with 16 buckets and 128 chains for optimal memory usage and quick kernel queries. If necessary, it can be manually configured as below:
cmake -DDLP_KDT_TABLE_SIZE=<number_of_buckets> -DDLP_KDT_CHAIN_SIZE=<number_of_chains> ..Enable and run tests and benchmarks in one place:
- Out-of-tree builds: Always build in a separate
build/directory to keep sources clean. - Custom install prefix:
cmake -DCMAKE_INSTALL_PREFIX=/opt/aocl-dlp ..
- Verbose output:
# Make make VERBOSE=1 # Ninja ninja -v
- Clean cache:
rm -rf build/* && cd build && cmake ..
- Parallel builds: Leverage all cores with
-j$(nproc)(Make) or default Ninja parallelism.
AOCL-DLP uses a modern CMake build system structured as follows:
- Main
CMakeLists.txt: Orchestrates the overall build process cmake/dlp_variables.cmake: Sets project variables, languages and standardscmake/dlp_core_options.cmake: Defines core library options and threading modelscmake/dlp_testing.cmake: Defines testing options and infrastructurecmake/dlp_benchmark.cmake: Defines benchmarking options and infrastructurecmake/dlp_build_options.cmake: Defines build target options (examples, sanitizers)cmake/dlp_documentation.cmake: Defines documentation options and generationcmake/dlp_dependencies.cmake: Manages OpenMP dependenciescmake/dlp_compiler_flags_linux.cmake: Sets compiler flags for Linuxcmake/dlp_compiler_flags_windows.cmake: Sets compiler flags for Windowscmake/dlp_install.cmake: Manages installation rulescmake/dlp_extensions.cmake: Defines file extensions
If you encounter issues with the selected threading model:
-
Make sure the required libraries are installed on your system:
- For OpenMP: OpenMP development libraries
-
For OpenMP-specific issues:
cmake -DDLP_THREADING_MODEL=openmp -DDLP_OPENMP_ROOT=/path/to/openmp ..Make sure your compiler supports:
- C11 standard for C code
- C++17 standard for C++ code
To speed up the build process, use parallel compilation:
make -j$(nproc) # Linux- Warnings may appear during compilation (-Werror is currently disabled)
- Some platforms may require specific environment setup for threading model detection