Skip to content

AOCL-DLP 5.3

Latest

Choose a tag to compare

@BhaskarNallani BhaskarNallani released this 31 May 03:22

AOCL-DLP (Deep Learning Primitives) is a high-performance library that provides optimized deep learning primitives for AMD processors.
The library implements Low Precision APIs for GEMM and BatchGEMM operations with various precision formats, comprehensive post-operations for fused computations, batch GEMM support, symmetric quantization routines, and parallel execution via OpenMP.
Highlights of AOCL-DLP 5.3
Added new GEMM APIs for pure FP16, F32×S8, BF16×S8→BF16 (with on-the-fly quantization), and BF16×U4 asymmetric weight-only quantization.
Delivered full JIT code generation for S8×S8 and U8×S8 GEMM/GEMV paths, including post-ops and column-major support.
Optimized BF16 and F32 JIT generators with AVX-512 GEMV, RD/k=1 kernel frameworks, BF16×S4 WOQ JIT, and batch GEMM JIT for int8
Improved multi-threading with new thread-local/library-level APIs, smart factorization, PGO support, and small matrix optimizations