llama-box + agent.cpp Combined Project

This project combines llama-box (multimodal inference server) with agent.cpp (C++ agent framework) into a single unified binary. tl;dr: build then run llama-server--model /your/model.gguf --mmproj ect.. thenllama-box-agent in another terminal. -sbot and -wspr are enabled by default. have ip webcam running on specified server for visual support Architecture

GGUF model (+ mmproj)
        |
        v
  llama-box server (port 8080)
  - Inference backend
  - Tool call parsing  
  - Multimodal support (vision)
  - OpenAI-compatible /v1 API
        |
        v
  agent.cpp client library
  - Agent loop
  - Tool execution
  - Memory management
        |
        v
  main.cpp (your application)
  - Tool implementations
  - Business logic

Components

llama-box: Multimodal inference server from gpustack
- Supports GGUF models
- Tool calling with native function parsing
- Vision support via mmproj files
- REST API at port 8080
agent.cpp: C++ agent framework from Mozilla AI
- HTTP client to llama-box
- Tool call dispatcher
- Cross-session memory
- Agent state management
Stable Diffusion: Image generation via stable-diffusion.cpp
- Latent diffusion models
- Text-to-image generation

Directory Structure

llama-box-agent/
├── CMakeLists.txt          # Unified build configuration
├── main.cpp                # Application entry point
├── src/
│   ├── agent.cpp           # Agent loop implementation
│   ├── agent.h             # Agent interface
│   ├── model.cpp           # Model interface
│   ├── model.h             # Model interface
│   └── callback.h          # Callback definitions
├── llama-box/              # llama-box server (subdir)
│   └── llama-box/          # Server source
├── llama.cpp/              # llama.cpp (subdir)
└── stable-diffusion.cpp/   # SD (subdir)

Building

cd ~/Desktop/llama-box-agent
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j $(nproc)

The binary will be at: build/llama-box-agent

Running

# Terminal 1: Start llama-box server
./build/llama-box/llama-box \
  --model ~/models/model.gguf \
  --mmproj ~/models/mmproj.gguf \
  --host 127.0.0.1 \
  --port 8080 \
  -c 32768 \
  -np 4 \
  --n-gpu-layers 99

# Terminal 2: Run agent
./build/llama-box-agent

Tool Calling

The project defines three built-in tools:

file_search(query, path): Search files by name in directory tree
read_file(filepath): Read full file contents
write_file(filepath, content): Write content to file

These are defined in main.cpp with JSON Schema and dispatched by name.

Configuration

Context window: 32768 tokens (configurable)
Temperature: 0.6 (recommended for tool calling)
Parallel slots: 4 (configurable)
GPU layers: 99 (offload all layers)

KV Cache Quant Trick (saves ~30% VRAM)

Use cache quantization for large contexts:

llama-box --cache-type-k q8_0 --cache-type-v q8_0

Troubleshooting

Empty tool responses: Check model supports function calling (Qwen2.5, Hermes3, Llama3.1)

OOM errors: Add --cache-type-k q8_0 --cache-type-v q8_0 or reduce model size

Multimodal issues: Verify mmproj matches model family exactly

Agent can't find server: Confirm http://127.0.0.1:8080/v1/models returns JSON

Built: May 2026

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
llama-box		llama-box
llama.cpp		llama.cpp
src		src
stable-diffusion.cpp		stable-diffusion.cpp
whisper.cpp @ afa2ea5		whisper.cpp @ afa2ea5
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeList22s.txt		CMakeList22s.txt
CMakeLists.txt		CMakeLists.txt
INTEGRATION_SUMMARY.txt		INTEGRATION_SUMMARY.txt
README.md		README.md
ROBOT_INTEGRATION.md		ROBOT_INTEGRATION.md
WHISPER_INTEGRATION.md		WHISPER_INTEGRATION.md
build.sh		build.sh
ipcam.cpp		ipcam.cpp
ipcam.h		ipcam.h
main.cpp		main.cpp
robot_tools.cpp		robot_tools.cpp
robot_tools.h		robot_tools.h
serial_robot.cpp		serial_robot.cpp
serial_robot.h		serial_robot.h
whisper_input.cpp		whisper_input.cpp
whisper_input.h		whisper_input.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama-box + agent.cpp Combined Project

Components

Directory Structure

Building

Running

Tool Calling

Configuration

KV Cache Quant Trick (saves ~30% VRAM)

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llama-box + agent.cpp Combined Project

Components

Directory Structure

Building

Running

Tool Calling

Configuration

KV Cache Quant Trick (saves ~30% VRAM)

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages