Which AI Model Writes the Best SQL for D1?
MyD1 ships with an AI SQL assistant that works with any OpenAI-compatible API. That means you can plug in Claude, GPT-4, a local Ollama model, or anything in between. But which model actually writes the best SQL?
We ran 8 popular models through 4 rounds of increasingly difficult SQL tasks — all targeting Cloudflare D1 (SQLite). Each round was scored out of 10 based on correctness, efficiency, and whether the query actually runs.
The Models
Premium Tier
Claude 3.5 Sonnet — Anthropic's go-to coding model.
GPT-4o — OpenAI's flagship multimodal.
Gemini 1.5 Pro — Google's long-context model.
Mid Tier
GPT-4o-mini — OpenAI's affordable option.
DeepSeek V3 — Open-source, competitive pricing.
Mistral Large — European-made, strong reasoning.
Free / Local
Llama 3.1 70B — Meta's large open model, via Ollama.
Llama 3.1 8B — Same family, runs on any Mac, via Ollama.
We asked each model to write 8 simple queries against a standard e-commerce schema: users, orders, products. Things like "find all orders placed this week" or "count users by country, sorted descending."
Every model nailed this. The differences were minor — mostly about aliasing style, whether they used LIMIT vs FETCH FIRST, and how clean the formatting was.
Time series is where D1's SQLite dialect starts to matter. There's no DATE_TRUNC, no GENERATE_SERIES, no INTERVAL. Models need to know that D1 uses strftime() and date() instead.
This is where the gap starts opening. Premium models understood the SQLite constraint and used WITH RECURSIVE to generate a date range. Smaller models either used PostgreSQL syntax (which doesn't run on D1) or just skipped the "zero days" requirement entirely.
DATE_TRUNC('day', created_at) — valid PostgreSQL, but D1 rejects it. The correct D1 syntax is strftime('%Y-%m-%d', created_at). Claude and GPT-4o got this right consistently.We gave models a 6-table schema (users, orders, order_items, products, categories, reviews) and asked for queries like "find products with above-average reviews that haven't been ordered in 90 days, grouped by category."
This round separates models that understand SQL from models that pattern-match SQL. Smaller models started producing queries with incorrect JOIN orders, missing GROUP BY columns, or subqueries that reference the wrong alias.
The hardest round. We gave each model a real PostgreSQL schema with SERIAL, UUID, JSONB, ENUM, TIMESTAMP WITH TIME ZONE, array columns, and partial indexes — then asked them to convert it to D1-compatible SQLite.
This round is brutal because it requires deep knowledge of both PostgreSQL and SQLite/D1 dialects. Models need to know that D1 doesn't support ENUM (use CHECK constraints), that JSONB becomes TEXT with json_extract(), that SERIAL becomes INTEGER PRIMARY KEY (autoincrement), and that D1 has no UUID type.
JSONB migration strategy, or D1-specific constraint limitations. This is the kind of task where you'll need to review and fix the output regardless of which model you use.Final Leaderboard
| Model | R1 Simple |
R2 Time Series |
R3 JOINs |
R4 Migration |
Total |
|---|---|---|---|---|---|
| Claude 3.5 Sonnet | 9.5 | 8.5 | 7.5 | 6.0 | 31.5 |
| GPT-4o | 9.5 | 8.0 | 7.0 | 5.5 | 30.0 |
| DeepSeek V3 | 9.0 | 7.5 | 6.5 | 5.0 | 28.0 |
| Gemini 1.5 Pro | 9.0 | 7.0 | 6.0 | 4.5 | 26.5 |
| Mistral Large | 9.0 | 7.0 | 5.5 | 4.0 | 25.5 |
| GPT-4o-mini | 8.5 | 6.0 | 5.0 | 3.5 | 23.0 |
| Llama 3.1 70B | 8.5 | 5.5 | 4.0 | 2.5 | 20.5 |
| Llama 3.1 8B | 7.5 | 3.5 | 2.0 | 1.0 | 14.0 |
Score Drop by Round
The harder the task, the more the scores diverge. Here's the average score across all models per round:
Tier List
Our Recommendation
For everyday use (free)
- Llama 3.1 70B via Ollama — handles 80% of real-world queries
- Simple SELECTs, basic JOINs, WHERE clauses
- Runs locally, no API costs, no data leaves your Mac
- Download free at ollama.com
For complex work (paid)
- Claude 3.5 Sonnet — best D1 dialect awareness
- Multi-table JOINs, CTEs, time series, migrations
- Use via API key or through OpenRouter
- Worth the cost when accuracy matters
Our suggested setup
Ollama + Claude
Use Ollama daily for free. Switch to Claude for the hard stuff.
Bottom line: Don't overpay for simple queries. A free local model handles the majority of everyday SQL work. But when you're dealing with complex JOINs, D1-specific time series, or Postgres-to-D1 migrations — a premium model pays for itself in time saved debugging.
MyD1's AI assistant lets you switch between providers in seconds. Start with Ollama and upgrade when you need to. Browse the free models library to get started.
Related: Why We Built MyD1 · D1 vs MySQL vs PostgreSQL