Silvio Chessari chessarisilvio

Silvio Chessari

🔧 🧠

Specializing in LLM inference optimization, speculative decoding, and MoE routing architectures.
Architecting self-built AI infrastructure running on consumer/datacenter GPUs via manual CUDA workarounds.
Bridging high-level models with embedded systems through MQTT, HID, and custom hardware controllers.

Focus	Language	Infrastructure	Hardware
LLM Inference	Python, C++, CUDA	llama.cpp, GGUF, EXL2	Tesla P40 (sm_61), RTX 3050
Embedded	C/C++, MicroPython	MQTT, Tailscale, Systemd	Arduino R4, ESP32, OLED
Automation	Bash, Node.js	Ollama, Docker	Custom Sensors, Thermal Arrays

🔬 LLM Research

nex2-mini-phase-twin-30b-lowvram-gguf-model Custom 30B parameter model quantized into a low-VRAM GGUF format for deployment on constrained consumer hardware.

add-video-input-support-to-llamacpp-mtmd Extended llama.cpp backend to support direct video frame ingestion for multimodal LLM inference pipelines.

automated-exl2-conversion-validation-pipeline Toolchain for fixing Qwen3.5 MoE architecture, converting to EXL2 with mixed quantization, and validating output benchmarks.

benchmark-4-agent-wrappers-on-qwen3627b-llamacpp Comparative analysis of four agent wrappers on Qwen3.6-27B, measuring latency, VRAM footprint, and response quality.

🏗️ AI Infrastructure

automazione-bootwatchdog-ai-avanzato Advanced Systemd watchdog monitoring llama-stack health, VRAM utilization, and token throughput with Telegram alerting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silvio Chessari chessarisilvio

Block or report chessarisilvio

Silvio Chessari

🔧 🧠

🔬 LLM Research

🏗️ AI Infrastructure

Popular repositories Loading

Uh oh!