A lightweight, streamlined guide and script collection for running GGUF (GPT-Generated Unified Format) models locally on Windows using llama.cpp.
C:\GGUF-Runner> run-ai-chat.bat
Loading model: models/AIModel.gguf...
llm_load_tensors: offloading 99 layers to GPU
.................................................. done.
User: Hello!
Assistant: Hello! I am your local AI assistant running via GGUF-Runner. How can I help you today?
User: _
To ensure a smooth experience, please check the following requirements:
| Component | Minimum | Recommended |
|---|---|---|
| CPU | x64 with AVX support | x64 with AVX2/AVX-512 |
| RAM | 8 GB (for 7B models) | 16 GB+ |
| GPU | Integrated Graphics | NVIDIA RTX (6GB+ VRAM) |
| Storage | 5 GB+ (Model dependent) | SSD (Fast loading) |
Note: For optimal performance, ensure your GPU VRAM is larger than the model file size for full GPU offloading.
- Visit the llama.cpp releases page.
- Download the appropriate version for your system:
- For GPU acceleration (NVIDIA):
llama-cuda-winx64.zip - For CPU only:
llama-winx64.zip
- For GPU acceleration (NVIDIA):
- Extract the contents into a folder (e.g.,
C:\GGUF-Runner).
- Create a folder named
modelsinside yourllama.cppdirectory. - Place your
.ggufmodel file inside.- Rename it to
AIModel.ggufor update the scripts accordingly. - Example Path:
models\AIModel.gguf
- Rename it to
Open a terminal in the folder and run:
main.exe -m models\AIModel.gguf -p "Hello!"For a continuous conversation experience:
main.exe -m models\AIModel.gguf --color -i -r "User:" --prompt "User: Hello\nAssistant:"Create a file named run-ai-chat.bat in the root folder and paste the following:
@echo off
cd /d %~dp0
main.exe -m models\AIModel.gguf --n-gpu-layers 99 --color -i -r "User:" --prompt "User: Hello\nAssistant:"
pauseTip: Double-click this .bat file to start chatting instantly!
- GPU Offloading: Adjust
--n-gpu-layers 99in the.batfile. Decrease this number if you run out of VRAM. - Context Size: Add
-c 2048to increase the memory of the conversation.
This project is licensed under the MIT License. See the LICENSE file for details.
Developed by Rajjit Laishram