AI Benchmark Testing

Models: Claude Sonnet 4.6 | Gemini 2.5 Flash | Microsoft Copilot Smart Mode | ChatGPT Tiers: All free tier max AI (except Claude, purposefully downgraded for comparability)

Results Table

Model	Q1	Q2	Q3	Total
Claude Sonnet 4.6	10/10	10/10	10/10	30
Microsoft Copilot Smart Mode	10/10	4.5/10	8/10	22
ChatGPT	10/10	4/10	5.5/10	19.5
Gemini 2.5 Flash	10/10	2.5/10	7/10	19.5

Questions

Q1: How many lines of code are in the chromium engine?
Q2: Make me a text editor website as ONE HTML file which may include html, css, and js which I can just open on my computer and boom it works. Just give me the HTML code.
Q3: Best gaming keyboard with RGB.

Answers

Q1: 35+ million lines
Q2: Human graded
Q3: Wooting 80HE should be in the answers, but doesn't have to be.

Standings

Place	Model	Score
🥇 1st	Claude Sonnet 4.6	30
🥈 2nd	Microsoft Copilot Smart Mode	22
🥉 3rd (tie)	Gemini 2.5 Flash & ChatGPT	19.5

Notes

These are the best models for the free tier. Claude was purposefully downgraded so that the benchmark could be comparable.
All 3 questions were asked the exact same way across every model.
Processing speed was not measured, but all models averaged under 10-15 seconds except Claude on Q2, which took approximately 3 minutes — this is a positive measure, not negative, as it reflects planning time for a fully featured text editor.
Q2 stress testing included conflicting filename handling for Claude only (the only model with file naming), and all 4 models were stress tested by measuring word count until crash.
ChatGPT and Copilot produced near identical results, but Copilot scored higher.
Gemini 2.5 Flash scored lowest across all three categories including stability and stress testing.
All models except Claude had hardcoded/defaulted document names with no ability to rename.

Full Document

View on Google Docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Benchmark Testing

Results Table

Questions

Answers

Standings

Notes

Full Document

FilesExpand file tree

RESULTS.md

Latest commit

History

RESULTS.md

File metadata and controls

AI Benchmark Testing

Results Table

Questions

Answers

Standings

Notes

Full Document