latentbrief

Model comparison

GPT-5.4 vs Gemini 3.1 Pro

The most significant observable difference is Gemini 3.1 Pro's support for audio and video inputs, which GPT-5.4 lacks.

Specs

MetricGPT-5.4Gemini 3.1 Pro
Context window1.1M tokens1.0M tokens
Input $/1M tokens$2.50$2.00
Output $/1M tokens$15.00$12.00
ModalitiesText · Image · FileAudio · File · Image · Text · Video
Open weightsNoNo

How they differ

Reasoning approach

GPT-5.4

GPT-5.4 is optimized for text-based reasoning, with additional support for image and file inputs.

Gemini 3.1 Pro

Gemini 3.1 Pro is designed for multimodal reasoning across text, audio, image, and video inputs.

Context handling

GPT-5.4

GPT-5.4 supports up to 1,050,000 tokens, with its focus on text and image data.

Gemini 3.1 Pro

Gemini 3.1 Pro supports up to 1,048,576 tokens with integrated multimodal capabilities.

Cost profile

GPT-5.4

GPT-5.4 costs $2.5 per 1M input tokens and $15.0 per 1M output tokens.

Gemini 3.1 Pro

Gemini 3.1 Pro costs $2.0 per 1M input tokens and $12.0 per 1M output tokens.

Vision

GPT-5.4

GPT-5.4 supports image inputs but does not include video processing.

Gemini 3.1 Pro

Gemini 3.1 Pro supports multimodal visual inputs, including images and videos.

GPT-5.4 — what sets it apart

  • +Has a marginally larger context window capacity at 1,050,000 tokens.
  • +Specializes in high-quality text and image tasks, excluding audio and video integration.

Gemini 3.1 Pro — what sets it apart

  • +Supports audio and video inputs, enabling richer multimodal interaction.
  • +Offers slightly lower input and output costs, making it more cost-efficient for certain use cases.

The most consequential difference is Gemini 3.1 Pro's capability to handle audio and video inputs, while GPT-5.4 focuses primarily on text and image tasks with a slightly larger token limit.

Analysis synthesized from gpt-4o, llama-4-maverick, phi-4, etc.