Models
The frontier AI models, explained
The right pick is rarely the one topping a leaderboard — it's the one that fits your workload, budget and constraints. These are the models and the labs worth knowing.
San Francisco, CA
The safety-first AI lab that made alignment research a precondition for building. Claude models are known for disciplined instruction following, precise tool use, and a 200K context window that handles entire codebases in one pass.
- 1M
Claude Opus 4.7
Anthropic's heavyweight for hard reasoning and agentic work.
- 1M
Claude Sonnet 4.6
The pragmatic default — Claude quality without Opus pricing.
- 200K
Claude Haiku 4.5
Fast, cheap, surprisingly capable for high-volume jobs.
San Francisco, CA
The lab that launched the current LLM era. GPT and o-series models anchor the widest developer ecosystem in the field — most tutorials, integrations, and third-party tooling start here.
- 1.1M
GPT-5.4
OpenAI's flagship — broadest modality and ecosystem coverage.
- 400K
GPT-5.4 Mini
GPT-5 economics for high-volume routine tasks.
Mountain View, CA
DeepMind and Google Brain, unified. The Gemini family brings native video and audio understanding and context windows up to 2M tokens — multimodal infrastructure at a scale no other lab matches.
- 1.0M
Gemini 3.1 ProPreview
Google's latest frontier model with expanded reasoning.
- 1.0M
Gemini 2.5 Pro
Google's bet on massive context and native multimodality.
- 1.0M
Gemini 2.5 Flash
Cheap multimodal at million-token scale.
Menlo Park, CA
Open weights by default. Meta releases Llama model weights publicly, giving teams full control to self-host, fine-tune and deploy frontier-grade models without API lock-in or per-token pricing.
- 328K
Llama 4 Scout
Open-weights frontier with a headline 10M-token context.
- 1.0M
Llama 4 Maverick
The bigger Llama 4 — frontier quality you can self-host.
Paris, France
The European challenger. Mistral builds efficient models with data residency and sovereignty as first-class concerns — clean function calling and GDPR-native infrastructure for teams where compliance is a hard requirement.
Hangzhou, China
The cost curve disruptor. DeepSeek challenged the assumption that frontier reasoning requires frontier pricing, then released the weights publicly — turning their advantage into a floor anyone can build on.
Hangzhou, China
China's open-weights powerhouse. The Qwen family spans 0.5B to 72B across text, vision, coding and math — with standout multilingual capability, especially in Chinese, that closed Western APIs can't match.
San Francisco, CA
Elon Musk's AI venture. Grok models differentiate on real-time X integration and inference-time compute — a unique angle for teams that need live web context beyond a training cutoff date.
Toronto, Canada
The enterprise retrieval specialist. Cohere focuses on retrieval-augmented generation and tool-calling rather than topping leaderboards. Command R+ is built for citation-accurate pipelines, with open weights so you're never locked in.
Legacy
Models we still cover for context but no longer recommend for new work.