AI World Cup

A public, reproducible benchmark for comparing LLMs on FIFA World Cup 2026 predictions.

Live Project

Website: https://jonaidshianifar.github.io/ai-world-cup/
Repository: https://github.com/jonaidshianifar/ai-world-cup

AI World Cup is an independent open-source project for testing how well different Large Language Models can predict the FIFA World Cup 2026 tournament when they are given the same football data, the same prompt, and the same scoring rules.

The project does not call LLM APIs. Instead, it uses a transparent manual workflow: generate one standardized tournament prompt, send it manually to different LLMs, import their responses, validate their predictions, score them as real results become available, and publish the leaderboard on a public website.

The public website was created and refined with support from ChatGPT 5.5 Plus.

What Is AI World Cup?

AI World Cup is a benchmark and public leaderboard for comparing LLM predictions on World Cup 2026.

It asks questions such as:

Which free LLM predicts match outcomes most accurately?
Which model gives the best full-tournament forecast?
Are some models better at group-stage predictions than knockout predictions?
Do models become overconfident when predicting football results?
How different are the predictions from ChatGPT, Gemini, Claude, DeepSeek, Qwen, Mistral, Grok, Perplexity, and other assistants?

The goal is not to create betting advice. The goal is to create a reproducible, transparent, and public experiment in LLM-based forecasting.

Why This Project Exists

Football prediction is difficult because it combines structured data, uncertainty, historical context, team strength, tournament dynamics, injuries, form, and randomness. LLMs are increasingly used for reasoning and forecasting, but their predictions are often difficult to compare fairly.

AI World Cup solves this by fixing the benchmark conditions:

Every model receives the same generated prompt.
Every model receives the same tournament data snapshot.
Every model must return the same JSON structure.
Raw responses are saved exactly as returned.
Predictions are validated before scoring.
The leaderboard is updated using transparent scoring rules.
The public website displays the leaderboard, tournament explorer, predictions, methodology, data snapshots, and results.

How It Works

flowchart LR
    A[Football Data Sources] --> B[Raw Data Snapshots]
    B --> C[SQLite Database]
    C --> D[Full-Tournament Prompt Generator]
    D --> E[Manual Submission to LLMs]
    E --> F[Raw Model Responses]
    F --> G[JSON Parser and Validator]
    G --> H[Structured Predictions]
    H --> I[Scoring Engine]
    I --> J[Leaderboard]
    J --> K[Static Website Export]
    K --> L[GitHub Pages Website]

Main Workflow

AI World Cup uses one full-tournament prompt as the recommended benchmark workflow.

aiwc data sync --sources openfootball,worldcup26
aiwc data status

aiwc prompts generate-tournament --version v1
aiwc prompts list

The generated prompt is then manually sent to each LLM. Each model returns one JSON response containing:

group-stage match predictions
predicted group standings
knockout-stage predictions
final ranking
award predictions
confidence values
short reasoning fields

The response is saved and imported:

aiwc responses import-tournament \
  --prompt-id PROMPT_ID \
  --model-name "Gemini Free" \
  --provider "Google" \
  --response-file data/responses/manual/gemini_tournament_v1.json

Then predictions are evaluated and exported to the website:

aiwc evaluate tournament --completed-only
aiwc leaderboard tournament
aiwc site export
aiwc readme leaderboard

What the Website Shows

The website is a static React application deployed with GitHub Pages. It reads exported JSON files and does not require a backend server.

The website includes:

project overview
model leaderboard
tournament explorer for comparing real results and model brackets
total points by model
outcome accuracy
exact score accuracy
average confidence
match-by-match predictions with real results
group-stage and knockout filtering inside predictions
team, model, group, and round filters
champion predictions
model details
prompt protocol and submission methodology
scoring rules
data snapshot information

Website data is exported from the Python pipeline into:

website/public/data/

Leaderboard

The leaderboard ranks models using evaluated predictions. Scores are updated as official results become available.

The main leaderboard includes:

Metric	Meaning
Total points	Sum of all scoring components
Group-stage points	Points from official group-stage fixture predictions
Group-standing points	Points from predicted group rankings and qualifiers
Knockout points	Points from predicted tournament progression
Outcome accuracy	Percentage of matches where the model predicted win/draw/loss correctly
Exact score accuracy	Percentage of matches where the model predicted the exact score
Average confidence	Mean confidence reported by the model
Champion prediction	The model's predicted tournament winner

Search-enabled assistants can be tracked separately from non-search models to keep the benchmark fair.

Current Tournament Leaderboard

Generated by aiwc readme leaderboard on 2026-06-12 11:37 UTC.

Rank	Model	Provider	Total	Group stage	Champion	Outcome acc.	Exact score acc.	Avg confidence
1	Claude Sonnet 4.6	Anthropic	11	11	Argentina	50.0%	50.0%	0.65
2	DeepSeek	DeepSeek	11	11	Brazil	50.0%	50.0%	0.71
3	GPT-5.5	OpenAI	11	11	Brazil	50.0%	50.0%	0.63
4	GPT-5.5 Thinking	OpenAI	11	11	Spain	50.0%	50.0%	0.61
5	Grok	xAI	11	11	France	50.0%	50.0%	0.69
6	Mistral Medium 3.5	Mistral AI	11	11	France	50.0%	50.0%	0.73
7	Perplexity Pro	Perplexity	11	11	Brazil	50.0%	50.0%	0.62
8	Gemini	Google	5	5	France	50.0%	0.0%	0.70
9	Perplexity	Perplexity	5	5	Brazil	50.0%	0.0%	0.61
10	Qwen 3 7	Qwen	5	5	Argentina	50.0%	0.0%	0.72

For the full interactive table and charts, open the website leaderboard page.

Scoring System

AI World Cup uses a points-based scoring system. The system is designed to reward both match-level accuracy and tournament-level forecasting.

Match Prediction Scoring

Prediction type	Points
Exact score	5
Correct outcome	3
Correct winner	2
Correct goal difference	1

For draws, the correct winner bonus is not added separately because the draw is already captured by the outcome score.

Group Standing Scoring

Prediction type	Points
Correct group winner	5
Correct top two teams	5
Correct qualified team from group	3 per team
Exact team rank	2 per team

Knockout and Tournament Scoring

Prediction type	Points
Correct team reaches Round of 32	2
Correct team reaches Round of 16	4
Correct team reaches quarter-final	6
Correct team reaches semi-final	8
Correct finalist	12
Correct champion	20
Correct runner-up	10
Correct third place	8
Correct fourth place	5

Total tournament points are the sum of all applicable scoring components.

Manual LLM Submission Protocol

To keep the benchmark fair, all model responses should follow the same protocol:

Generate the tournament prompt from this repository.
Use the same prompt version for every model.
Use the same data snapshot for every model.
Copy the full prompt without editing it.
Send the prompt manually to each LLM.
Disable web search when possible for the main leaderboard.
Save each model response exactly as returned.
Import the raw response into the repository.
Record the model name, provider, access mode, date, and notes.
Evaluate only after official match results are available.

Models to Compare

The project is designed for manually tested LLMs and assistants, including free and pro models such as:

Claude Sonnet 4.6
DeepSeek
Gemini
Mistral Medium 3.5
GPT-5.5
GPT-5.5 Thinking
Perplexity
Perplexity Pro
Qwen 3.7
Grok Recommended separation:
Main leaderboard: models using only the provided prompt data.
Search-augmented leaderboard: tools that use live web search or external retrieval.

The current website shows submitted model details on the Methodology page, alongside the prompt protocol and scoring rules.

Required Tournament Response Format

The full-tournament prompt expects a JSON object with this structure:

{
  "metadata": {
    "project": "AI World Cup",
    "prompt_version": "v1",
    "data_snapshot_id": "...",
    "model_name": "...",
    "provider": "...",
    "prediction_created_at": "YYYY-MM-DD"
  },
  "group_stage_predictions": [
    {
      "match_number": 1,
      "stage": "Group Stage",
      "group": "A",
      "home_team": "...",
      "away_team": "...",
      "predicted_home_goals": 0,
      "predicted_away_goals": 0,
      "predicted_outcome": "HOME_WIN|DRAW|AWAY_WIN",
      "predicted_winner": "team name or DRAW",
      "confidence": 0.0,
      "reasoning_short": "maximum 40 words"
    }
  ],
  "predicted_group_standings": [
    {
      "group": "A",
      "rank": 1,
      "team": "...",
      "points": 0,
      "goals_for": 0,
      "goals_against": 0,
      "goal_difference": 0
    }
  ],
  "knockout_predictions": [
    {
      "match_number": 73,
      "stage": "Round of 32",
      "home_team": "...",
      "away_team": "...",
      "predicted_home_goals": 0,
      "predicted_away_goals": 0,
      "predicted_outcome": "HOME_WIN|AWAY_WIN",
      "predicted_winner": "...",
      "confidence": 0.0,
      "reasoning_short": "maximum 40 words"
    }
  ],
  "final_ranking": {
    "champion": "...",
    "runner_up": "...",
    "third_place": "...",
    "fourth_place": "..."
  },
  "awards_predictions": {
    "top_scorer": "...",
    "best_player": "...",
    "best_young_player": "...",
    "best_goalkeeper": "..."
  }
}

Data Sources

AI World Cup can use multiple football data sources.

Source	API key	Purpose
OpenFootball	No	Fixtures and historical World Cup data
worldcup26.ir	No	World Cup 2026 teams, groups, games, stadiums
football-data.org	Optional	Additional match and standing data
API-Football	Optional	Additional fixtures, teams, rounds, standings

No LLM API keys are required. The LLM comparison workflow remains offline after model responses are manually collected.

Technical Architecture

The project contains two main systems:

Python benchmark pipeline
Static public website

ai-world-cup/
  src/ai_world_cup/        # Python package and CLI
  data/                    # raw data, snapshots, prompts, responses, exports
  website/                 # React/Vite static website
  website/public/data/     # exported JSON used by the website
  docs/                    # methodology and protocol documentation
  tests/                   # pytest test suite
  .github/workflows/       # GitHub Pages deployment

Python Pipeline

The Python pipeline handles:

data synchronization
raw snapshots
SQLite storage
prompt generation
response importing
JSON parsing
validation
scoring
leaderboard generation
website data export

Website

The website uses:

React
Vite
TypeScript
Tailwind CSS
Recharts
TanStack Table
GitHub Pages

The site is static and reads JSON files from:

website/public/data/

Installation

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env

Optional API keys can be added to .env:

FOOTBALL_DATA_TOKEN=
API_FOOTBALL_KEY=

Common Commands

Sync Data

aiwc data sync --sources openfootball,worldcup26
aiwc data status

Generate Full-Tournament Prompt

aiwc prompts generate-tournament --version v1
aiwc prompts list

Import a Model Response

aiwc responses import-tournament \
  --prompt-id PROMPT_ID \
  --model-name "ChatGPT Free" \
  --provider "OpenAI" \
  --response-file data/responses/manual/chatgpt_tournament_v1.json

Evaluate and Show Leaderboard

aiwc evaluate tournament --completed-only
aiwc leaderboard tournament

Export Website Data

aiwc site export

Update README Leaderboard

aiwc readme leaderboard

Running the Website Locally

cd website
npm install
npm run dev

Build the static website:

npm run build

GitHub Pages Deployment

The website is deployed using GitHub Pages and GitHub Actions.

To deploy:

Push the repository to GitHub.
Open repository settings.
Go to Pages.
Set source to GitHub Actions.
Push to main.

The site will be published at:

https://jonaidshianifar.github.io/ai-world-cup/

To update website data:

aiwc evaluate tournament --completed-only
aiwc site export
aiwc readme leaderboard
git add README.md website/public/data
git commit -m "Update AI World Cup website data and README leaderboard"
git push

For website UI changes, also stage the relevant website/src files before committing.

Daily Automation

For local daily updates after games start, run:

./daily_update.sh

The script syncs match data, recalculates tournament scores, exports website JSON, prints the current tournament leaderboard, commits changed website/data files, and pushes to GitHub. If no exported data changed, it exits without creating a commit.

You can override the default commit message:

COMMIT_MESSAGE="Update results" ./daily_update.sh

The repository also includes .github/workflows/daily-update.yml, which can run the same update automatically in GitHub Actions. It runs every day at 07:00 UTC and can also be started manually from the GitHub Actions tab.

The daily GitHub workflow:

sync the latest football data,
update snapshots,
evaluate predictions against completed matches,
export website JSON,
commit updated data,
push to main,
trigger the GitHub Pages deployment workflow.

This keeps the public leaderboard and charts up to date during the tournament.

Development

ruff format .
ruff check .
pytest

Frontend build:

cd website
npm install
npm run build

Methodology

AI World Cup follows these methodological principles:

1. Same Prompt

Every model receives the same full-tournament prompt.

2. Same Data Snapshot

Every prediction is linked to a specific data snapshot. This prevents unfair comparisons caused by changing fixture data or updated football information.

3. Manual Submission

The repository does not call LLM APIs. Manual submission makes the benchmark accessible to free models and avoids dependency on paid inference services.

4. Raw Response Preservation

Each model response is stored exactly as returned before parsing or validation.

5. Structured Validation

Responses must follow the expected JSON schema. Invalid or inconsistent predictions are flagged before scoring.

6. Gradual Evaluation

Predictions can be evaluated gradually as World Cup matches are completed.

7. Transparent Leaderboard

Scoring rules are visible, deterministic, and applied equally to all models.

8. Search Separation

Search-enabled assistants should be evaluated separately unless all models are allowed to use web search.

Limitations

Football outcomes are highly uncertain.
LLMs may hallucinate unavailable statistics.
Free model versions can change over time.
Some assistants may silently use web search or hidden tools.
Long full-tournament prompts may be handled differently by different models.
The benchmark depends on the quality and availability of football data sources.

Not Betting Advice

AI World Cup is a research, benchmarking, and visualization project. It is not betting advice, financial advice, or a guarantee of any real-world match outcome.

Attribution

Project created by Jonaid Shianifar and Iias Faiud.

The public website and project documentation were created and refined with support from ChatGPT 5.5 Plus.

License

This project's source code is licensed under the MIT License.

Data, fixtures, team names, logos, competition information, and other football-related content may come from third-party sources and may be subject to their own licenses, terms of use, or attribution requirements.

AI World Cup is an independent project and is not affiliated with FIFA.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
data		data
docs		docs
scripts		scripts
src/ai_world_cup		src/ai_world_cup
tests		tests
website		website
.env.example		.env.example
.gitignore		.gitignore
AI_World_Cup_2026.png		AI_World_Cup_2026.png
LICENSE		LICENSE
README.md		README.md
daily_update.sh		daily_update.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

AI World Cup

Live Project

What Is AI World Cup?

Why This Project Exists

How It Works

Main Workflow

What the Website Shows

Leaderboard

Current Tournament Leaderboard

Scoring System

Match Prediction Scoring

Group Standing Scoring

Knockout and Tournament Scoring

Manual LLM Submission Protocol

Models to Compare

Required Tournament Response Format

Data Sources

Technical Architecture

Python Pipeline

Website

Installation

Common Commands

Sync Data

Generate Full-Tournament Prompt

Import a Model Response

Evaluate and Show Leaderboard

Export Website Data

Update README Leaderboard

Running the Website Locally

GitHub Pages Deployment

Daily Automation

Development

Methodology

1. Same Prompt

2. Same Data Snapshot

3. Manual Submission

4. Raw Response Preservation

5. Structured Validation

6. Gradual Evaluation

7. Transparent Leaderboard

8. Search Separation

Limitations

Not Betting Advice

Attribution

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages