AMD Strix Halo local LLM guide: setup for Ryzen AI MAX+ 395 / Radeon 8060S, Ollama, llama.cpp Vulkan/RADV, ROCm, raw evidence, direct 100 t/s 30B Qwen, 140 t/s CHADROCK MTP, 120B GGUF.
-
Updated
Jun 21, 2026 - Python
AMD Strix Halo local LLM guide: setup for Ryzen AI MAX+ 395 / Radeon 8060S, Ollama, llama.cpp Vulkan/RADV, ROCm, raw evidence, direct 100 t/s 30B Qwen, 140 t/s CHADROCK MTP, 120B GGUF.
Operator-grade GPU monitor for NVIDIA GPUs with native GB10 / DGX Spark coherent UMA support — PSI pressure, clock detection, ConnectX-7 network layer
Unified Memory Abstraction Layer for AI Inference on AMD APUs and Intel iGPUs
A CUDA implementation of the transpose-free Quasi-Minimal Residual method
Local inference server for Apple Silicon — hot-swaps MLX models (LLM, vision, embeddings, TTS, STT) via OpenAI API
gpu thrashingNVIDIA GPU Unified Memory diagnostic tool — architecture-aware, measurement-based, PCIe/coherent transport detection
Talos-O (Omni): A sovereign, embodied agentic organism forged on AMD Strix Halo. Integrating the Chimera Kernel (Linux 7.0), Zero-Copy Introspection, and the Phronesis Engine. Built from First Principles.
Fundamentals of Accelerated Computing C/C++ is a course provided by NVIDIA.
NVML unified memory shim for NVIDIA DGX Spark Grace Blackwell GB10 - enables MAX Engine, PyTorch, and GPU monitoring
Apple Silicon Unified Memory for GPU-Accelerated Analytics — TPC-H benchmarks across DuckDB, NumPy, and MLX
Performance comparison of two different forms of memory management in CUDA
Empirical kernel scheduling characterization for NVIDIA GB10 (SM121a). Sweeps GEMM tile configurations, classifies PTX instruction paths, captures hardware telemetry
Run LLMs larger than your RAM — native GGUF inference engine with SSD streaming, no GPU required
Honest local LLM deployment planning and benchmarking for high unified-memory Macs and future Linux/NVIDIA rigs.
Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87 t/s for large models without cloud or subscription costs
GB10-aware CUPTI Activity collector — runtime kind detection, phase management, and JSON output for hardware-coherent UMA platforms
Unified memory ML inference engine for AMD APU (RDNA 3.5 / gfx1150) — Custom HIP kernels with 2.2x speedup over standard allocation
Research into CUDA Unified Memory as a VRAM extension for LLM inference
Reproducible Pascal GPU Unified Memory benchmark with Nsight and nvprof profiling
3D U-Net with tf.keras for Large-Model-Support or Unified Memory
Add a description, image, and links to the unified-memory topic page so that developers can more easily learn about it.
To associate your repository with the unified-memory topic, visit your repo's landing page and select "manage topics."