Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
-
Updated
Jun 3, 2026 - Python
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
Official code repo for NeurIPS 2025 Spotlight paper, "Debate or Vote: Which Yields Better Decisions in Multi-Agent LLMs?"
Framework: Multi-Agent LLMs For Conversational Task-Solving (MALLM)
Research-backed methodology for multi-AI collaborative decision-making with structured debate, consensus synthesis, and bias reduction
Source code for the paper: Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention
Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
Code for "Multiple LLM Agents Debate for Equitable Cultural Alignment" [ACL 2025 Oral]
Code review, but with 5 models arguing first.
Multi-model deliberative design review — a Claude Code skill that runs structured debate between Gemini and GPT to surface blind spots in architecture decisions.
A brutally fault-tolerant Mixture-of-Agents (MoA) pipeline built in pure Python. Designed to orchestrate chaotic, round-robin LLM proxy endpoints through a rigorous 4-stage Agentic Workflow (Generate ➔ Cross-Critique ➔ Rebuttal ➔ Judge). Built to eradicate hallucination and guarantee absolute accuracy in complex, multi-step reasoning tasks.
Three Claude Code skills for working with Codex CLI: codex-bridge (one-shot Codex calls), mad-build (Claude+Codex collaboration with cross-review), and mad-research (three-stream adversarial audit of papers, grants, reports with anonymized cross-critique and fresh-Codex synthesis).
An adversarial AI expert workshop that stress-tests a research paper (rival-tradition referees argue; every comment quote-grounded and independently re-verified) and then rebuilds it: tracked-changes redline, clean version, your code re-run under a provenance wall, and a replication package. A Claude Code skill.
Research paper on how agentic debate pipelines can be constructed to reduce hallucinations in LLMs with open-source and commercial models
Generate research papers autonomously by chatting with OpenClaw, using Python 3.11+, with a self-evolving framework and extensive test coverage.
AI Agent Workspace Redesign: A structured multi-agent debate methodology for managing AI agent workspaces (memory, file organization, protection tiers, boot sequences)
supporting codes for the study on multi-agent debate protocols
Neurips paper code - Evaluating and enhancing Large Language Models (LLMs) using mathematical datasets through innovative Multi-Agent Debate Architecture, without traditional fine-tuning or Retrieval-Augmented Generation techniques. This project explores advanced strategies to boost LLM capabilities in mathematical reasoning.
Multi-LLM debate orchestrator that drives ChatGPT, Claude, and DeepSeek web UIs (no API keys) through a 5-phase loop: propose → critique → revise → synthesize → ratify-or-veto. Editorial dark UI.
Run your decisions through a jury of 12 AI minds before you commit.
Build autonomous experiment loops that edit files, run tests, and keep only improvements for any project type
Add a description, image, and links to the multi-agent-debate topic page so that developers can more easily learn about it.
To associate your repository with the multi-agent-debate topic, visit your repo's landing page and select "manage topics."