High performance, cross-platform ONNX inference runtime + quantization/serving toolkit - Run ONNX models on Apple Silicon NPU/GPU, arm64 and amd64 CPU (with or without AVX512 vectorization on Intel/AMD), Nvidia GPU, and webGPU (transformers.js)
Convert pytorch and other models formats into ONNX, and chop up/reformat/assemble and serve ONNX machine learning models directly in your Go application. Our goal is to make it easy to containerize and serve (or run locally) small to medium-sized machine learning workloads at low cost and complexity, across runtimes and compute environments, in a format that is amenable to finetuning, model composition, and distributed inference/machine learning.
Contents to be incrementally populated from other Accretional internal/external inference repositories - please stand by