Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
-
Updated
May 8, 2025 - Python
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions (NeurIPS 2025)
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"
[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
[ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation
[CVPR 2025 Highlight] Official repository for CoMM Dataset
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
[NeurIPS 2025] GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
The code used to train and run inference with MMDocRAG
A Survey of Multimodal Retrieval-Augmented Generation
NeurIPS 2025 D&B | RAG-IGBench: benchmark for RAG-based interleaved image-text generation.
Unofficial PyTorch reproduction for Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling.
Unofficial PyTorch reproduction for OmniGen2, instruction-aligned multimodal generation and editing.
[ICML 2026] TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation
Unofficial PyTorch reproduction for DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation.
[ICML 2026] TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation
Unified multimodal generation via image-flow matching in a shared visual latent space
Unofficial PyTorch reproduction for Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models.
Add a description, image, and links to the multimodal-generation topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-generation topic, visit your repo's landing page and select "manage topics."