multimodel-large-language-model

Star

Here are 22 public repositories matching this topic...

FlagOpen / RoboBrain2.5

Star

RoboBrain 2.5: Advanced version of RoboBrain. Depth in Sight, Time in Mind. 🎉🎉🎉

embodied-ai multimodel-large-language-model

Updated Feb 28, 2026
Python

inclusionAI / UI-Venus

Star

UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.

reinforcement-learning grounding multimodel-large-language-model ui-agent

Updated May 11, 2026
Python

JIA-Lab-research / Seg-Zero

Star

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

reinforcement-learning segmentation multimodal multimodel-large-language-model reasoning-language-models

Updated Jan 17, 2026
Python

jqtangust / Robust-R1

Star

🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

multi-modal reasoning robustness trustworthy-ai large-language-models multimodel-large-language-model

Updated Jan 20, 2026
Python

sun-hailong / TVC

Star

[ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.

reasoning r1 cot forgetting mllms multimodel-large-language-model

Updated May 16, 2025
Python

🦙 echoOLlama: A real-time voice AI platform powered by local LLMs. Features WebSocket streaming, voice interactions, and OpenAI API compatibility. Built with FastAPI, Redis, and PostgreSQL. Perfect for private AI conversations and custom voice assistants.

agent docker docker-compose openai llama lgm realtime-api fastapi llm ollama llama3 multimodel-large-language-model

Updated Nov 9, 2024
Jupyter Notebook

BIGBALLON / BeyondCLIP

Star

Not a neutral survey — a field manual for engineers who build, train, and ship multimodal retrieval at production scale. The C-L-I triangle (Compression · Localization · Instruction), MLLM encoders vs late interaction, MUVERA economics, and falsifiable forecasts through 2030.

information-retrieval retrieval image-search image-retrieval universal-embedding composed-image-retrieval text-image-retrieval large-language-models multimodel-large-language-model universal-multimodal-embedding

Updated Apr 20, 2026
HTML

xinyanghuang7 / Basic-Visual-Language-Model

Star

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

visual-language-learning large-language-models visual-language-models multimodel-large-language-model

Updated Jun 19, 2024
Python

SufyanDanish / VLM-Survey-

Star

A comprehensive survey of Vision–Language Models: Pretrained models, fine-tuning, prompt engineering, adapters, and benchmark datasets

Updated Mar 27, 2026

zhangguanghao523 / CMMCoT

Star

[AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl qwen2-5-vl

Updated Dec 5, 2025
Python

balaji1233 / AI-Radiology-Reporting

Star

Using MAIRA-2 multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays.

pytorch radiology-reports llm multimodel-large-language-model

Updated May 9, 2025
Jupyter Notebook

ramihuunguyen / LLMP2

Star

Evaluating ‘Graphical Perception’ with Multimodal Large Language Models

computer-vision deep-learning visual-reasoning graphical-perception multimodel-large-language-model chart-intepretation

Updated Feb 27, 2026
Jupyter Notebook

mubashir1837 / Luma-Health

Star

Multi-Modal Healthcare Assistant

data-science deep-learning healthcare llm multimodel-large-language-model

Updated Jun 1, 2025
Python

sarahabumandil / Fair-Multi-Fusion-Research-

Star

research-paper modality-fusion multimodel-large-language-model

Updated Jun 19, 2026

264Gaurav / gemma-image-text-ai

Star

Gemma3 Vision - AI Image Analysis & Chat

sse server-sent-events image-analysis fastapi text-image ollama multimodel-large-language-model gemma3

Updated Dec 16, 2025
JavaScript

iamafridi / elaMath

Star

ElaMath is a smart, voice-enabled math assistant that helps students solve and understand math problems using both spoken questions and images. It’s powered by the powerful multimodal meta-llama/llama-4-scout-17b-16e-instruct model via Groq API, combined with Whisper for speech recognition and ElevenLabs/gTTS for natural voice responses.

ai llama groq groq-ai multimodel-large-language-model

Updated May 30, 2025
Python

iamafridi / elarova-2.0

Star

Elarova — A smart, multimodal research assistant designed to help students by combining speech, text, and other input modes for efficient academic research and study support. Powered by state-of-the-art speech recognition, text-to-speech, and AI models, including meta-llama/llama-4-scout-17b-16e-instruct, with an easy-to-use Gradio web interface.

meta ai llama groq multimodel-large-language-model

Updated Jun 10, 2025
Python

Nileshsan / Digital-product-feature-multimodel-large-language-model

Star

Create a tool that uses a multimodal LLM to describe testing instructions for any digital product's features, based on the screenshots.

python django django-rest-framework html-css-javascript multimodel-large-language-model

Updated Sep 9, 2024
Jupyter Notebook

KuchikiRenji / llmchat

Star

LLMChat is an open-source, privacy-first AI chatbot (powering LLMChat.co). It’s a Next.js + TypeScript monorepo that gives you one interface for multiple LLMs (OpenAI, Anthropic, Google, Groq, Ollama, etc.) with Deep Research and Pro Search modes, optional auth and credits and local-first storage (IndexedDB) so chat history stays in the browser.

workflow typescript nextjs monorepo gemini indexeddb openai dexie aichatbot claude tiptap turborepo langchain shadcnui multimodel-large-language-model deepresearch llmchat prosearch privacy-first-ai-agents

Updated Jan 31, 2026
TypeScript

saikiranvankudothu / MMDIS

Star

Multimodel Document Intelliigence for better document understanding and context awareness for Academic Documents

localization document documentintelligence multimodel-large-language-model kimi-k2

Updated Jan 13, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodel-large-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodel-large-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodel-large-language-model

Here are 22 public repositories matching this topic...

FlagOpen / RoboBrain2.5

inclusionAI / UI-Venus

JIA-Lab-research / Seg-Zero

jqtangust / Robust-R1

sun-hailong / TVC

theboringhumane / echoOLlama

BIGBALLON / BeyondCLIP

xinyanghuang7 / Basic-Visual-Language-Model

SufyanDanish / VLM-Survey-

zhangguanghao523 / CMMCoT

balaji1233 / AI-Radiology-Reporting

ramihuunguyen / LLMP2

mubashir1837 / Luma-Health

sarahabumandil / Fair-Multi-Fusion-Research-

264Gaurav / gemma-image-text-ai

iamafridi / elaMath

iamafridi / elarova-2.0

Nileshsan / Digital-product-feature-multimodel-large-language-model

KuchikiRenji / llmchat

saikiranvankudothu / MMDIS

Improve this page

Add this topic to your repo