Skip to content
This repository was archived by the owner on May 4, 2026. It is now read-only.

rajjitlai/GGUF-Runner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

🚀 GGUF-Runner

License: MIT Platform Powered By

A lightweight, streamlined guide and script collection for running GGUF (GPT-Generated Unified Format) models locally on Windows using llama.cpp.


📺 Preview

C:\GGUF-Runner> run-ai-chat.bat
Loading model: models/AIModel.gguf...
llm_load_tensors: offloading 99 layers to GPU
.................................................. done.

User: Hello!
Assistant: Hello! I am your local AI assistant running via GGUF-Runner. How can I help you today?

User: _

💻 Hardware Requirements

To ensure a smooth experience, please check the following requirements:

Component Minimum Recommended
CPU x64 with AVX support x64 with AVX2/AVX-512
RAM 8 GB (for 7B models) 16 GB+
GPU Integrated Graphics NVIDIA RTX (6GB+ VRAM)
Storage 5 GB+ (Model dependent) SSD (Fast loading)

Note: For optimal performance, ensure your GPU VRAM is larger than the model file size for full GPU offloading.


🛠️ Setup Instructions

Step 1: Download llama.cpp

  1. Visit the llama.cpp releases page.
  2. Download the appropriate version for your system:
    • For GPU acceleration (NVIDIA): llama-cuda-winx64.zip
    • For CPU only: llama-winx64.zip
  3. Extract the contents into a folder (e.g., C:\GGUF-Runner).

Step 2: Prepare Your Model

  1. Create a folder named models inside your llama.cpp directory.
  2. Place your .gguf model file inside.
    • Rename it to AIModel.gguf or update the scripts accordingly.
    • Example Path: models\AIModel.gguf

🚀 How to Run

Option A: Standard CLI (Direct)

Open a terminal in the folder and run:

main.exe -m models\AIModel.gguf -p "Hello!"

Option B: Interactive Chat Mode

For a continuous conversation experience:

main.exe -m models\AIModel.gguf --color -i -r "User:" --prompt "User: Hello\nAssistant:"

Option C: One-Click Launch (Recommended)

Create a file named run-ai-chat.bat in the root folder and paste the following:

@echo off
cd /d %~dp0
main.exe -m models\AIModel.gguf --n-gpu-layers 99 --color -i -r "User:" --prompt "User: Hello\nAssistant:" 
pause

Tip: Double-click this .bat file to start chatting instantly!


⚙️ Customization

  • GPU Offloading: Adjust --n-gpu-layers 99 in the .bat file. Decrease this number if you run out of VRAM.
  • Context Size: Add -c 2048 to increase the memory of the conversation.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Developed by Rajjit Laishram

About

GGUF-Runner - Want to run LLMs locally, use this guide, and run with LLAMA.cpp

Topics

Resources

License

Stars

Watchers

Forks

Contributors