LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
-
Updated
Jun 18, 2026 - C++
LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
KnoLo Core is a local-first knowledge base engine built for small language models (LLMs). It packages your documents into a compact .knolo file and enables fully deterministic querying — no embeddings, no vector databases, no cloud services required. Designed for on-device and edge LLM deployments.
On-device shell command generator for macOS Tahoe. Uses Apple's 3B model with dynamic few-shot retrieval from 21k tldr examples.
Declare the outcome, skip the logic. The developer-first infrastructure engine for structured, multi-turn AI conversations with built-in safety, async follow-ups, and native compliance
Android 16 fork. AI as a platform primitive. Twelve capabilities, one shared runtime, every app. OEM-pluggable. Apache 2.0.
Agentic Android Open Source Project (AAOSP) — Android fork with native LLM system service, MCP-aware apps, and an agent-driven launcher. On-device Qwen 2.5 via llama.cpp. Apps declare tools in their manifest. The OS runs the model.
High-performance Android SDK for on-device LLM inference (GGUF). Privacy-focused, offline-first, and powered by llama.cpp with a clean Kotlin Coroutines API.
Apple FoundationModels API on iOS 18+. Same call site, native passthrough on iOS 26 (Apple Intelligence), CoreML / MLX backends on older OSes. Drop-in source compatible.
Reverse-engineering notes on fm, Apple's Foundation Models CLI in macOS 27: on-device model catalog (9M/85M/300M/3B + code/vision/speech), Private Cloud Compute, Siri local<->cloud routing, and the OpenAI-compatible 'fm serve' API.
Run Apple Intelligence, CoreML, and MLX models using a unified Swift interface for local language model sessions on iOS and macOS.
Ash — offline survival assistant for iOS. Gemma 4 E2B/E4B fully on-device (text · image · voice) with RAG-grounded answers over 56 emergency-response packs. Built for the Kaggle Gemma 4 Good Hackathon.
Run LLMs on Snapdragon NPU — including the 'unsupported' 8 Gen 1 (Hexagon v69). Verified at 31 tok/s on OnePlus 10 Pro.
📱 手机端 AI 操作系统全景知识库 — 334+ 篇深度页面,覆盖端侧大模型、AI Agent、芯片适配、推理优化 | 自动更新
iOS app that runs a local LLM on-device to transcribe meetings and generate structured notes — action items, decisions, and summaries. No cloud, no API keys, no data leaves the phone.
Kotlin Multiplatform engine for running Gemma LLMs on-device on Android via LiteRT-LM — stateful KV-cache chat sessions, resumable model management, function calling. Includes NativeLM, a private Local AI chat app. AGPL-3.0 / commercial.
JibarOS organization profile.
온디바이스 LLM + RAG 기반 로컬 음악 추천 Android 앱 | Android local music recommendation app powered by on-device LLM and RAG
Unified Kotlin API for on-device LLMs using each platform's built-in models.
Execution infrastructure for local-first AI. Reason locally, execute globally.
Add a description, image, and links to the on-device-llm topic page so that developers can more easily learn about it.
To associate your repository with the on-device-llm topic, visit your repo's landing page and select "manage topics."