A curated list of LLM/MLLM guardrails, safety benchmarks, guard models, jailbreak attacks, moderation datasets, and evaluation tools.
-
Updated
Jun 6, 2026
A curated list of LLM/MLLM guardrails, safety benchmarks, guard models, jailbreak attacks, moderation datasets, and evaluation tools.
RevealVLLMSafetyEval is a comprehensive pipeline for evaluating Vision-Language Models (VLMs) on their compliance with harm-related policies. It automates the creation of adversarial multi-turn datasets and the evaluation of model responses, supporting responsible AI development and red-teaming efforts.
VAlign-Robust: A research framework for quantifying and mitigating semantic hallucination drift in Vision-Language Models (VLMs) under sensory degradation and adversarial noise.
Add a description, image, and links to the multimodal-safety topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-safety topic, visit your repo's landing page and select "manage topics."