Model comparison

Gemini 2.5 Pro vs Gemini 3.1 Pro

The primary observable difference is that Gemini 3.1 Pro has a higher cost per million tokens for both input and output compared to Gemini 2.5 Pro, despite similar context and modality support.

Google

Gemini 2.5 Pro

Google's bet on massive context and native multimodality.

Google

Gemini 3.1 Pro

Google's latest frontier model with expanded reasoning.

Specs

Metric	Gemini 2.5 Pro	Gemini 3.1 Pro
Context window	1.0M tokens↑	1.0M tokens
Input $/1M tokens	$1.25↑	$2.00
Output $/1M tokens	$10.00↑	$12.00
Modalities	Text · Image · File · Audio · Video	Audio · File · Image · Text · Video
Open weights	No	No

How they differ

Cost profile

Gemini 2.5 Pro

Gemini 2.5 Pro is priced at $1.25 per million input tokens and $10.0 per million output tokens.

Gemini 3.1 Pro

Gemini 3.1 Pro is priced at $2.0 per million input tokens and $12.0 per million output tokens.

Context handling

Gemini 2.5 Pro

Gemini 2.5 Pro supports a context size of 1,048,576 tokens with stable performance throughout.

Gemini 3.1 Pro

Gemini 3.1 Pro also supports a context size of 1,048,576 tokens but demonstrates improved consistency and coherence in multi-turn interactions near the context limit.

Vision

Gemini 2.5 Pro

Gemini 2.5 Pro supports input modalities that include image, text, audio, video, and file.

Gemini 3.1 Pro

Gemini 3.1 Pro supports the same input modalities, with improvements in OCR implementation and multimodal coherence.

Gemini 2.5 Pro — what sets it apart

+Gemini 2.5 Pro offers lower input and output costs, making it more budget-friendly.
+Processes inputs and outputs with slightly faster average latency.

Gemini 3.1 Pro — what sets it apart

+Gemini 3.1 Pro provides better long-term coherence in extended multi-turn interactions.
+Demonstrates improved optimization in coding tasks and enhanced multimodal integration.

The higher operational cost of Gemini 3.1 Pro compared to Gemini 2.5 Pro is the most consequential difference, with potential justifications in improved reasoning and multimodal capabilities.

Analysis synthesized from gpt-4o, llama-4-maverick, phi-4, etc.

← Back to all models