“`html
On March 5, 2026, OpenAI unveiled GPT-5.4, its most advanced and effective frontier model thus far, amalgamating sophisticated reasoning, programming, and agentic workflows into one cohesive system.
This model is being deployed across ChatGPT (known as GPT-5.4 Thinking), the API, and Codex, with a high-performance GPT-5.4 Pro version available for users demanding maximum computational power on intricate tasks.
GPT-5.4 integrates functionalities that were previously distributed among various models, merging the industry-leading programming capabilities of GPT-5.3-Codex with enhanced general reasoning and innate computer-interaction skills.
The outcome is a model crafted for comprehensive professional workflows, ranging from spreadsheets and presentations to elaborate multi-step agentic assignments, minimizing the amount of back-and-forth engagement needed from users.
Within ChatGPT, GPT-5.4 Thinking introduces an initial reasoning strategy that lets users intervene and adjust the model’s direction mid-response without needing to restart, facilitating more precise and contextually relevant outputs. This real-time guidance marks a significant evolution from earlier reasoning models, where necessary adjustments required starting from scratch.
GPT-5.4 Launched
GPT-5.4 establishes new benchmark scores across various essential industry assessments:
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| GDPval (wins or ties) | 83.0% | 70.9% | 70.9% |
| SWE-Bench Pro (Public) | 57.7% | 56.8% | 55.6% |
| OSWorld-Verified | 75.0% | 74.0% | 47.3% |
| Toolathlon | 54.6% | 51.9% | 46.3% |
| BrowseComp | 82.7% | 77.3% | 65.8% |
In the GDPval benchmark, which evaluates agents across 44 roles spanning the top 9 U.S. GDP sectors, GPT-5.4 matches or exceeds industry specialists in 83% of evaluations, up from 70.9% with GPT-5.2.
In the BigLaw Bench test focused on legal document tasks, the model earned a score of 91%, as noted by Harvey’s Head of Applied Research, Niko Grupen.
GPT-5.4 marks OpenAI’s inaugural general-purpose model with built-in computer-use capabilities, allowing agents to engage directly with software through screenshots, mouse actions, and keyboard commands.
With OSWorld-Verified, it achieves a 75.0% success rate, outperforming human benchmarks set at 72.4% and significantly exceeding GPT-5.2’s 47.3% result.
On WebArena-Verified, GPT-5.4 secures a browser success rate of 67.3%, while attaining 92.8% on Online-Mind2Web using solely screenshot-based data.
The model also accommodates 1 million tokens of context within the API, facilitating long-term task execution across extensive agent workflows that align with context window offerings from Google and Anthropic.
OpenAI highlighted that GPT-5.4 is its most accurate model to date, with individual assertions 33% less prone to being erroneous and complete responses 18% less likely to have mistakes compared to GPT-5.2.
This model provides substantial token-efficiency enhancements, utilizing significantly fewer tokens to resolve similar reasoning challenges, which directly translates into reduced API expenses and quicker response times for enterprise developers.
In operational settings, Mainstay CEO Dod Fraser indicated that GPT-5.4 achieved a 95% success rate on its first attempt across ~30,000 property portals, completing sessions three times faster while using 70% fewer tokens compared to earlier computer-use models.
GPT-5.4 Thinking is currently accessible for ChatGPT Plus, Team, and Pro subscribers, succeeding GPT-5.2 Thinking within the next three months. Developers can utilize GPT-5.4 and GPT-5.4 Pro through the OpenAI API, with expedited processing enabled for enhanced token speed in real-world applications.
“`