A public, reproducible benchmark for comparing LLMs on FIFA World Cup 2026 predictions.
Website: https://jonaidshianifar.github.io/ai-world-cup/
Repository: https://github.com/jonaidshianifar/ai-world-cup
AI World Cup is an independent open-source project for testing how well different Large Language Models can predict the FIFA World Cup 2026 tournament when they are given the same football data, the same prompt, and the same scoring rules.
The project does not call LLM APIs. Instead, it uses a transparent manual workflow: generate one standardized tournament prompt, send it manually to different LLMs, import their responses, validate their predictions, score them as real results become available, and publish the leaderboard on a public website.
The public website was created and refined with support from ChatGPT 5.5 Plus.
AI World Cup is a benchmark and public leaderboard for comparing LLM predictions on World Cup 2026.
It asks questions such as:
- Which free LLM predicts match outcomes most accurately?
- Which model gives the best full-tournament forecast?
- Are some models better at group-stage predictions than knockout predictions?
- Do models become overconfident when predicting football results?
- How different are the predictions from ChatGPT, Gemini, Claude, DeepSeek, Qwen, Mistral, Grok, Perplexity, and other assistants?
The goal is not to create betting advice. The goal is to create a reproducible, transparent, and public experiment in LLM-based forecasting.
Football prediction is difficult because it combines structured data, uncertainty, historical context, team strength, tournament dynamics, injuries, form, and randomness. LLMs are increasingly used for reasoning and forecasting, but their predictions are often difficult to compare fairly.
AI World Cup solves this by fixing the benchmark conditions:
- Every model receives the same generated prompt.
- Every model receives the same tournament data snapshot.
- Every model must return the same JSON structure.
- Raw responses are saved exactly as returned.
- Predictions are validated before scoring.
- The leaderboard is updated using transparent scoring rules.
- The public website displays the leaderboard, tournament explorer, predictions, methodology, data snapshots, and results.
flowchart LR
A[Football Data Sources] --> B[Raw Data Snapshots]
B --> C[SQLite Database]
C --> D[Full-Tournament Prompt Generator]
D --> E[Manual Submission to LLMs]
E --> F[Raw Model Responses]
F --> G[JSON Parser and Validator]
G --> H[Structured Predictions]
H --> I[Scoring Engine]
I --> J[Leaderboard]
J --> K[Static Website Export]
K --> L[GitHub Pages Website]
AI World Cup uses one full-tournament prompt as the recommended benchmark workflow.
aiwc data sync --sources openfootball,worldcup26
aiwc data status
aiwc prompts generate-tournament --version v1
aiwc prompts listThe generated prompt is then manually sent to each LLM. Each model returns one JSON response containing:
- group-stage match predictions
- predicted group standings
- knockout-stage predictions
- final ranking
- award predictions
- confidence values
- short reasoning fields
The response is saved and imported:
aiwc responses import-tournament \
--prompt-id PROMPT_ID \
--model-name "Gemini Free" \
--provider "Google" \
--response-file data/responses/manual/gemini_tournament_v1.jsonThen predictions are evaluated and exported to the website:
aiwc evaluate tournament --completed-only
aiwc leaderboard tournament
aiwc site export
aiwc readme leaderboardThe website is a static React application deployed with GitHub Pages. It reads exported JSON files and does not require a backend server.
The website includes:
- project overview
- model leaderboard
- tournament explorer for comparing real results and model brackets
- total points by model
- outcome accuracy
- exact score accuracy
- average confidence
- match-by-match predictions with real results
- group-stage and knockout filtering inside predictions
- team, model, group, and round filters
- champion predictions
- model details
- prompt protocol and submission methodology
- scoring rules
- data snapshot information
Website data is exported from the Python pipeline into:
website/public/data/
The leaderboard ranks models using evaluated predictions. Scores are updated as official results become available.
The main leaderboard includes:
| Metric | Meaning |
|---|---|
| Total points | Sum of all scoring components |
| Group-stage points | Points from official group-stage fixture predictions |
| Group-standing points | Points from predicted group rankings and qualifiers |
| Knockout points | Points from predicted tournament progression |
| Outcome accuracy | Percentage of matches where the model predicted win/draw/loss correctly |
| Exact score accuracy | Percentage of matches where the model predicted the exact score |
| Average confidence | Mean confidence reported by the model |
| Champion prediction | The model's predicted tournament winner |
Search-enabled assistants can be tracked separately from non-search models to keep the benchmark fair.
Generated by aiwc readme leaderboard on 2026-06-12 11:37 UTC.
| Rank | Model | Provider | Total | Group stage | Group standings | Knockout | Champion | Outcome acc. | Exact score acc. | Avg confidence |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Anthropic | 11 | 11 | 0 | 0 | Argentina | 50.0% | 50.0% | 0.65 |
| 2 | DeepSeek | DeepSeek | 11 | 11 | 0 | 0 | Brazil | 50.0% | 50.0% | 0.71 |
| 3 | GPT-5.5 | OpenAI | 11 | 11 | 0 | 0 | Brazil | 50.0% | 50.0% | 0.63 |
| 4 | GPT-5.5 Thinking | OpenAI | 11 | 11 | 0 | 0 | Spain | 50.0% | 50.0% | 0.61 |
| 5 | Grok | xAI | 11 | 11 | 0 | 0 | France | 50.0% | 50.0% | 0.69 |
| 6 | Mistral Medium 3.5 | Mistral AI | 11 | 11 | 0 | 0 | France | 50.0% | 50.0% | 0.73 |
| 7 | Perplexity Pro | Perplexity | 11 | 11 | 0 | 0 | Brazil | 50.0% | 50.0% | 0.62 |
| 8 | Gemini | 5 | 5 | 0 | 0 | France | 50.0% | 0.0% | 0.70 | |
| 9 | Perplexity | Perplexity | 5 | 5 | 0 | 0 | Brazil | 50.0% | 0.0% | 0.61 |
| 10 | Qwen 3 7 | Qwen | 5 | 5 | 0 | 0 | Argentina | 50.0% | 0.0% | 0.72 |
For the full interactive table and charts, open the website leaderboard page.
AI World Cup uses a points-based scoring system. The system is designed to reward both match-level accuracy and tournament-level forecasting.
| Prediction type | Points |
|---|---|
| Exact score | 5 |
| Correct outcome | 3 |
| Correct winner | 2 |
| Correct goal difference | 1 |
For draws, the correct winner bonus is not added separately because the draw is already captured by the outcome score.
| Prediction type | Points |
|---|---|
| Correct group winner | 5 |
| Correct top two teams | 5 |
| Correct qualified team from group | 3 per team |
| Exact team rank | 2 per team |
| Prediction type | Points |
|---|---|
| Correct team reaches Round of 32 | 2 |
| Correct team reaches Round of 16 | 4 |
| Correct team reaches quarter-final | 6 |
| Correct team reaches semi-final | 8 |
| Correct finalist | 12 |
| Correct champion | 20 |
| Correct runner-up | 10 |
| Correct third place | 8 |
| Correct fourth place | 5 |
Total tournament points are the sum of all applicable scoring components.
To keep the benchmark fair, all model responses should follow the same protocol:
- Generate the tournament prompt from this repository.
- Use the same prompt version for every model.
- Use the same data snapshot for every model.
- Copy the full prompt without editing it.
- Send the prompt manually to each LLM.
- Disable web search when possible for the main leaderboard.
- Save each model response exactly as returned.
- Import the raw response into the repository.
- Record the model name, provider, access mode, date, and notes.
- Evaluate only after official match results are available.
The project is designed for manually tested LLMs and assistants, including free and pro models such as:
-
Claude Sonnet 4.6
-
DeepSeek
-
Gemini
-
Mistral Medium 3.5
-
GPT-5.5
-
GPT-5.5 Thinking
-
Perplexity
-
Perplexity Pro
-
Qwen 3.7
-
Grok Recommended separation:
-
Main leaderboard: models using only the provided prompt data.
-
Search-augmented leaderboard: tools that use live web search or external retrieval.
The current website shows submitted model details on the Methodology page, alongside the prompt protocol and scoring rules.
The full-tournament prompt expects a JSON object with this structure:
{
"metadata": {
"project": "AI World Cup",
"prompt_version": "v1",
"data_snapshot_id": "...",
"model_name": "...",
"provider": "...",
"prediction_created_at": "YYYY-MM-DD"
},
"group_stage_predictions": [
{
"match_number": 1,
"stage": "Group Stage",
"group": "A",
"home_team": "...",
"away_team": "...",
"predicted_home_goals": 0,
"predicted_away_goals": 0,
"predicted_outcome": "HOME_WIN|DRAW|AWAY_WIN",
"predicted_winner": "team name or DRAW",
"confidence": 0.0,
"reasoning_short": "maximum 40 words"
}
],
"predicted_group_standings": [
{
"group": "A",
"rank": 1,
"team": "...",
"points": 0,
"goals_for": 0,
"goals_against": 0,
"goal_difference": 0
}
],
"knockout_predictions": [
{
"match_number": 73,
"stage": "Round of 32",
"home_team": "...",
"away_team": "...",
"predicted_home_goals": 0,
"predicted_away_goals": 0,
"predicted_outcome": "HOME_WIN|AWAY_WIN",
"predicted_winner": "...",
"confidence": 0.0,
"reasoning_short": "maximum 40 words"
}
],
"final_ranking": {
"champion": "...",
"runner_up": "...",
"third_place": "...",
"fourth_place": "..."
},
"awards_predictions": {
"top_scorer": "...",
"best_player": "...",
"best_young_player": "...",
"best_goalkeeper": "..."
}
}AI World Cup can use multiple football data sources.
| Source | API key | Purpose |
|---|---|---|
| OpenFootball | No | Fixtures and historical World Cup data |
| worldcup26.ir | No | World Cup 2026 teams, groups, games, stadiums |
| football-data.org | Optional | Additional match and standing data |
| API-Football | Optional | Additional fixtures, teams, rounds, standings |
No LLM API keys are required. The LLM comparison workflow remains offline after model responses are manually collected.
The project contains two main systems:
- Python benchmark pipeline
- Static public website
ai-world-cup/
src/ai_world_cup/ # Python package and CLI
data/ # raw data, snapshots, prompts, responses, exports
website/ # React/Vite static website
website/public/data/ # exported JSON used by the website
docs/ # methodology and protocol documentation
tests/ # pytest test suite
.github/workflows/ # GitHub Pages deployment
The Python pipeline handles:
- data synchronization
- raw snapshots
- SQLite storage
- prompt generation
- response importing
- JSON parsing
- validation
- scoring
- leaderboard generation
- website data export
The website uses:
- React
- Vite
- TypeScript
- Tailwind CSS
- Recharts
- TanStack Table
- GitHub Pages
The site is static and reads JSON files from:
website/public/data/
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .envOptional API keys can be added to .env:
FOOTBALL_DATA_TOKEN=
API_FOOTBALL_KEY=aiwc data sync --sources openfootball,worldcup26
aiwc data statusaiwc prompts generate-tournament --version v1
aiwc prompts listaiwc responses import-tournament \
--prompt-id PROMPT_ID \
--model-name "ChatGPT Free" \
--provider "OpenAI" \
--response-file data/responses/manual/chatgpt_tournament_v1.jsonaiwc evaluate tournament --completed-only
aiwc leaderboard tournamentaiwc site exportaiwc readme leaderboardcd website
npm install
npm run devBuild the static website:
npm run buildThe website is deployed using GitHub Pages and GitHub Actions.
To deploy:
- Push the repository to GitHub.
- Open repository settings.
- Go to Pages.
- Set source to GitHub Actions.
- Push to
main.
The site will be published at:
https://jonaidshianifar.github.io/ai-world-cup/
To update website data:
aiwc evaluate tournament --completed-only
aiwc site export
aiwc readme leaderboard
git add README.md website/public/data
git commit -m "Update AI World Cup website data and README leaderboard"
git pushFor website UI changes, also stage the relevant website/src files before committing.
For local daily updates after games start, run:
./daily_update.shThe script syncs match data, recalculates tournament scores, exports website JSON, prints the current tournament leaderboard, commits changed website/data files, and pushes to GitHub. If no exported data changed, it exits without creating a commit.
You can override the default commit message:
COMMIT_MESSAGE="Update results" ./daily_update.shThe repository also includes .github/workflows/daily-update.yml, which can run the same update automatically in GitHub Actions. It runs every day at 07:00 UTC and can also be started manually from the GitHub Actions tab.
The daily GitHub workflow:
- sync the latest football data,
- update snapshots,
- evaluate predictions against completed matches,
- export website JSON,
- commit updated data,
- push to
main, - trigger the GitHub Pages deployment workflow.
This keeps the public leaderboard and charts up to date during the tournament.
ruff format .
ruff check .
pytestFrontend build:
cd website
npm install
npm run buildAI World Cup follows these methodological principles:
Every model receives the same full-tournament prompt.
Every prediction is linked to a specific data snapshot. This prevents unfair comparisons caused by changing fixture data or updated football information.
The repository does not call LLM APIs. Manual submission makes the benchmark accessible to free models and avoids dependency on paid inference services.
Each model response is stored exactly as returned before parsing or validation.
Responses must follow the expected JSON schema. Invalid or inconsistent predictions are flagged before scoring.
Predictions can be evaluated gradually as World Cup matches are completed.
Scoring rules are visible, deterministic, and applied equally to all models.
Search-enabled assistants should be evaluated separately unless all models are allowed to use web search.
- Football outcomes are highly uncertain.
- LLMs may hallucinate unavailable statistics.
- Free model versions can change over time.
- Some assistants may silently use web search or hidden tools.
- Long full-tournament prompts may be handled differently by different models.
- The benchmark depends on the quality and availability of football data sources.
AI World Cup is a research, benchmarking, and visualization project. It is not betting advice, financial advice, or a guarantee of any real-world match outcome.
Project created by Jonaid Shianifar and Iias Faiud.
The public website and project documentation were created and refined with support from ChatGPT 5.5 Plus.
This project's source code is licensed under the MIT License.
Data, fixtures, team names, logos, competition information, and other football-related content may come from third-party sources and may be subject to their own licenses, terms of use, or attribution requirements.
AI World Cup is an independent project and is not affiliated with FIFA.
