🚀 GGUF-Runner

A lightweight, streamlined guide and script collection for running GGUF (GPT-Generated Unified Format) models locally on Windows using llama.cpp.

📺 Preview

C:\GGUF-Runner> run-ai-chat.bat
Loading model: models/AIModel.gguf...
llm_load_tensors: offloading 99 layers to GPU
.................................................. done.

User: Hello!
Assistant: Hello! I am your local AI assistant running via GGUF-Runner. How can I help you today?

User: _

💻 Hardware Requirements

To ensure a smooth experience, please check the following requirements:

Component	Minimum	Recommended
CPU	x64 with AVX support	x64 with AVX2/AVX-512
RAM	8 GB (for 7B models)	16 GB+
GPU	Integrated Graphics	NVIDIA RTX (6GB+ VRAM)
Storage	5 GB+ (Model dependent)	SSD (Fast loading)

Note: For optimal performance, ensure your GPU VRAM is larger than the model file size for full GPU offloading.

🛠️ Setup Instructions

Step 1: Download llama.cpp

Visit the llama.cpp releases page.
Download the appropriate version for your system:
- For GPU acceleration (NVIDIA): llama-cuda-winx64.zip
- For CPU only: llama-winx64.zip
Extract the contents into a folder (e.g., C:\GGUF-Runner).

Step 2: Prepare Your Model

Create a folder named models inside your llama.cpp directory.
Place your .gguf model file inside.
- Rename it to AIModel.gguf or update the scripts accordingly.
- Example Path: models\AIModel.gguf

🚀 How to Run

Option A: Standard CLI (Direct)

Open a terminal in the folder and run:

main.exe -m models\AIModel.gguf -p "Hello!"

Option B: Interactive Chat Mode

For a continuous conversation experience:

main.exe -m models\AIModel.gguf --color -i -r "User:" --prompt "User: Hello\nAssistant:"

Option C: One-Click Launch (Recommended)

Create a file named run-ai-chat.bat in the root folder and paste the following:

@echo off
cd /d %~dp0
main.exe -m models\AIModel.gguf --n-gpu-layers 99 --color -i -r "User:" --prompt "User: Hello\nAssistant:" 
pause

Tip: Double-click this .bat file to start chatting instantly!

⚙️ Customization

GPU Offloading: Adjust --n-gpu-layers 99 in the .bat file. Decrease this number if you run out of VRAM.
Context Size: Add -c 2048 to increase the memory of the conversation.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Developed by Rajjit Laishram

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 GGUF-Runner

📺 Preview

💻 Hardware Requirements

🛠️ Setup Instructions

Step 1: Download llama.cpp

Step 2: Prepare Your Model

🚀 How to Run

Option A: Standard CLI (Direct)

Option B: Interactive Chat Mode

Option C: One-Click Launch (Recommended)

⚙️ Customization

📄 License

About

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🚀 GGUF-Runner

📺 Preview

💻 Hardware Requirements

🛠️ Setup Instructions

Step 1: Download llama.cpp

Step 2: Prepare Your Model

🚀 How to Run

Option A: Standard CLI (Direct)

Option B: Interactive Chat Mode

Option C: One-Click Launch (Recommended)

⚙️ Customization

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!