AI Agent Frontier models 2026.06.27

GPT-5.6 Sol, Terra & Luna: Full Review, Benchmarks, Pricing & Access Guide (2026)

On June 26, 2026, OpenAI released its largest model family of the year: GPT-5.6 Sol, Terra, and Luna. Named after celestial bodies for the first time, flagship Sol dethrones Claude Mythos 5 on TerminalBench 2.1 with a record 91.9% score. All three tiers hit OpenAI's internal "High" cybersecurity classification. Due to a U.S. government request, only about 20 vetted partner organizations can access the models today; broad availability is expected within weeks.

For AI engineers, agent developers, and enterprise decision-makers, this guide answers three questions: ① what Sol, Terra, and Luna are, how Max and Ultra modes work, and how pricing compares; ② full benchmark data across TerminalBench, Agent's Last Exam, CTF, ExploitBench, and GeneBench; ③ a six-step access and model-selection playbook, plus a head-to-head vs Mythos 5. Data through 2026-06-27.

01 Release context, solar naming, and government restriction pain points

OpenAI officially launched the GPT-5.6 series in the early hours of June 27, 2026 (Beijing time), introducing a solar-system naming scheme for the first time: Sol (the Sun) for flagship, Terra (Earth) for balanced, and Luna (the Moon) for lightweight tiers. This is OpenAI's most significant release since GPT-5.5, and the first family where every tier—including entry-level Luna—crossed OpenAI's internal "High" cybersecurity risk rating.

The launch was anything but smooth. Following President Trump's June 2, 2026 executive order allowing U.S. government agencies up to 30 days of pre-release access to review frontier AI models, OpenAI was asked to limit GPT-5.6 to government-vetted partners before a broad rollout. This marks the first time the U.S. government has formally required an AI company to restrict a frontier model release.

Core pain points developers face right now:

  • Access locked behind ~20 partners: ordinary ChatGPT users and most API customers cannot reach Sol, Terra, or Luna yet, despite the public announcement on June 26.
  • June flagship vacuum: Anthropic's Claude Fable 5 and Mythos 5 were forced offline on June 12 under export controls—see our Fable 5 ban and alternatives guide—while Google's Gemini 3.5 Pro slipped to July.
  • Routing uncertainty: teams that rebuilt pipelines around leaked GPT-5.6 specs from June leak intel now face a three-tier product line with distinct price/performance curves.
  • Precedent risk: government pre-release review could become a recurring gate for every frontier drop, delaying global access and complicating compliance planning.

"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them." — Sam Altman, OpenAI CEO

Altman publicly stated OpenAI would comply while pushing back against making government approval a permanent industry norm.

The "Big Three" flagship releases blocked in June 2026
Company Model Status
OpenAI GPT-5.6 Sol / Terra / Luna Limited preview (~20 approved partners)
Anthropic Claude Fable 5 / Mythos 5 Forced offline June 12 (U.S. export control)
Google Gemini 3.5 Pro Delayed to July (originally planned for June)

June 2026 was supposed to be the biggest month in AI history. Instead, all three flagship releases got stuck at the door. More context in TechTimes government lock analysis and the OpenAI official preview announcement.

02 Sol, Terra, Luna: models, Max/Ultra modes, and pricing

GPT-5.6 replaces the single-tier GPT-5.5 release cadence with a three-model lineup designed for different workload economics. All three share a reported ~1.5M token context window, up from GPT-5.5's 1M.

GPT-5.6 model comparison at a glance
Model Tier Input price Output price Best for
GPT-5.6 Sol Flagship $5 / 1M tokens $30 / 1M tokens Complex coding, security research, long-horizon agents
GPT-5.6 Terra Balanced $2.50 / 1M tokens $15 / 1M tokens High-volume business tasks, document analysis, customer support
GPT-5.6 Luna Lightweight $1 / 1M tokens $6 / 1M tokens Summarization, drafting, routine automation

GPT-5.6 Sol is OpenAI's most capable model to date, built for the hardest tasks: advanced programming, long-chain cybersecurity research, and multi-step autonomous agent workflows. Sol pricing matches GPT-5.5 at $5 input / $30 output per million tokens, but delivers substantially higher capability.

Two new Sol reasoning modes:

  • Max mode: Sol spends additional time reasoning before responding—trading latency for accuracy. Use when correctness matters more than speed.
  • Ultra mode: A multi-agent architecture. Sol decomposes complex tasks, spawns parallel subagents, executes in parallel, and merges results. This design is the core reason Sol achieved its TerminalBench record. Ultra consumes significantly more tokens and should be reserved for genuinely complex work.

GPT-5.6 Terra is the daily workhorse for enterprise-scale deployments: customer support, internal tools, and document analysis at volume. Performance is close to GPT-5.5 while costing 50% less—the best price/performance ratio for large-scale API usage.

GPT-5.6 Luna targets high-frequency, low-latency workloads: summarization, drafting, and lightweight automation. Luna is also the first non-flagship OpenAI model to receive a High capability rating in both cybersecurity and biology simultaneously, at 80% lower cost than Sol.

GPT-5.6 pricing vs GPT-5.5 and Claude Fable 5
Model Input Output Notes
GPT-5.6 Sol $5/M $30/M Same price as GPT-5.5, much higher performance
GPT-5.6 Terra $2.50/M $15/M 50% cheaper than Sol; GPT-5.5-level performance
GPT-5.6 Luna $1/M $6/M 80% cheaper than Sol
Claude Fable 5 (offline) $10/M $50/M Sol delivers comparable or superior capability at half the cost

Recommended use-case mapping:

  • Complex code generation, debugging, multi-step agents → Sol (Max or Ultra as needed)
  • Enterprise document analysis, customer support, high-volume API → Terra
  • High-frequency summarization, drafting, routine automation → Luna
  • Budget-limited but need GPT-5.5-class quality → Terra
  • Latency-critical real-time apps (from July) → Sol on Cerebras at up to 750 token/s

03 Benchmark results: TerminalBench, agents, cybersecurity, and life sciences

Benchmark figures below come from OpenAI's preview materials and the GPT-5.6 Deployment Safety System Card. Full independent verification awaits the complete system card at general release.

Coding: TerminalBench 2.1

TerminalBench 2.1 is among the most authoritative agentic coding benchmarks, with 89 complex command-line planning tasks testing multi-step tool use, iterative repair, and task coordination.

TerminalBench 2.1 leaderboard (June 2026)
Model Score Mode
GPT-5.6 Sol 91.9% Ultra (multi-agent)
GPT-5.6 Sol 88.8% Standard
Claude Mythos 5 88.0% Standard
GPT-5.5 83.4% Standard
Gemini 3.1 Pro Preview 70.7% Standard

Claude Mythos 5 had held the top spot for only 17 days (since June 9) before Sol displaced it. Coverage: SiliconAngle GPT-5.6 vs Mythos 5 analysis.

Long-horizon agents: Agent's Last Exam

Agent's Last Exam task completion (code mode)
Model Completion rate
GPT-5.6 Sol 50.9% (only model above 50%)
GPT-5.6 Luna Slightly above GPT-5.5

Cybersecurity: CTF and ExploitBench

GPT-5.6 is the first OpenAI product line where all three tiers trigger a "High" cybersecurity risk classification.

Capture-the-Flag (CTF) hit rates
Model Hit rate
Sol 96.7%
Terra 91.84%
Luna 85.19%

On ExploitBench, Sol matches Anthropic's Mythos Preview performance while using only about one-third of the output tokens—the same security-research capability at dramatically lower cost.

Safety note: OpenAI red-teaming confirmed Sol can identify vulnerabilities and exploit primitives in Chromium and Firefox codebases, but cannot autonomously construct a complete, functional exploit chain against hardened real-world targets. It remains below OpenAI's "Cyber Critical" threshold.

Life sciences: GeneBench v1 and HealthBench

  • GeneBench v1 (genomics and quantitative biology): Sol matches or exceeds GPT-5.5 using fewer tokens.
  • HealthBench Professional: Sol scores 60.5, a +8.7 point improvement over GPT-5.5.

Speed: Cerebras deployment in July

Starting July 2026, GPT-5.6 Sol will deploy on Cerebras hardware for select enterprise customers at up to 750 tokens per second. Most frontier models today run at 50–150 token/s—meaning 5× to 15× faster responses for real-time coding assistants and streaming agent applications. A 10-second response could complete in under one second at peak throughput.

Safety infrastructure built into GPT-5.6:

  • Real-time misuse classifiers on every output
  • Account-level review for sensitive workflows
  • 700,000 A100-equivalent GPU hours of automated red-teaming
  • Universal jailbreak testing across cross-prompt attack vectors
  • Specialized large reasoning model as a secondary filter if primary safeguards fail
  • External security organization review before launch

04 Six steps to access GPT-5.6 and pick the right tier

With general availability still weeks away, teams should prepare routing, budgets, and test harnesses now rather than waiting for ChatGPT rollout.

  1. Track the access timeline: Subscribe to the OpenAI official blog, VentureBeat launch coverage, and Polymarket contracts. Traders currently assign an 87% probability that GPT-5.6 will be broadly released by July 31, 2026.
  2. Map your workload to Sol / Terra / Luna: Reserve Sol Ultra for multi-step agent pipelines where TerminalBench-class performance justifies token cost; route bulk document and support workloads to Terra; push summarization and classification to Luna.
  3. Rebuild cost models with three price points: Sol at $5/$30, Terra at $2.50/$15, Luna at $1/$6 per million tokens. Model Ultra mode as a 2–4× token multiplier on complex agent tasks.
  4. Prepare fallback routing while Mythos 5 is offline: Maintain LiteLLM or equivalent multi-provider gateways. Cross-read our Fable 5 alternatives guide and June leak roundup for interim model choices.
  5. Stage benchmark harnesses before API GA: Pre-build TerminalBench-style evals, CTF smoke tests, and Agent's Last Exam subsets so you can compare Sol standard vs Ultra on day one of API access.
  6. Plan for Cerebras latency tier in July: If sub-second streaming matters (live coding copilots, customer-facing agents), flag Sol-on-Cerebras at 750 token/s for enterprise procurement; keep Terra/Luna on standard inference for cost-sensitive batch work.
GPT-5.6 access timeline
Phase Timeline Access
Current (June 2026) Now ~20 government-approved partners via API and Codex only
General release Within weeks (July 2026 expected) ChatGPT Plus/Pro first, then public API
Cerebras Sol July 2026 Select enterprise customers, up to 750 token/s
Government review window ~July 2, 2026 (30-day EO window) U.S. cyber executive order framework finalization expected

05 GPT-5.6 Sol vs Claude Mythos 5, citable hard data, and FAQ

GPT-5.6 Sol vs Claude Mythos 5 head-to-head
Dimension GPT-5.6 Sol Claude Mythos 5
TerminalBench 2.1 (coding) 91.9% (Ultra) / 88.8% standard 88.0%
ExploitBench (cybersecurity) Near-identical to Mythos Preview, ~1/3 output tokens Strong (restricted access, data not fully public)
Input / output pricing $5 / $30 per M $10 / $50 per M (currently offline)
Availability Limited preview; general release within weeks Offline since June 12 (U.S. export control)
Context window ~1.5M tokens 200K tokens

Bottom line: Sol leads on TerminalBench and offers comparable security-research capability at a fraction of the cost. Mythos 5 may still lead on benchmarks like SWE-Bench Pro where GPT-5.6 system card data has not been fully published. Fable 5 held advantages on other agentic coding dimensions before going offline.

Citable hard data (through 2026-06-27):

  • TerminalBench 2.1: Sol 91.9% (Ultra), 88.8% (standard); Mythos 5 88.0%; GPT-5.5 83.4%; Gemini 3.1 Pro Preview 70.7%; Mythos 5 dethroned after 17 days at #1
  • Agent's Last Exam: Sol 50.9% task completion—the only model above 50%
  • CTF hit rates: Sol 96.7%, Terra 91.84%, Luna 85.19%
  • ExploitBench token efficiency: Sol matches Mythos Preview at roughly one-third output token cost
  • HealthBench Professional: Sol 60.5 (+8.7 vs GPT-5.5)
  • Cerebras Sol speed: up to 750 token/s from July 2026 (vs 50–150 token/s typical frontier baseline)
  • Red-teaming investment: 700,000 A100-equivalent GPU hours before launch
  • Access restriction: ~20 vetted partner organizations under White House / OSTP / ONCD coordinated review
  • Polymarket broad-release odds: 87% by July 31, 2026

FAQ — the questions developers ask most:

Q1: Is GPT-5.6 available on ChatGPT now?
Not for the general public. Access is limited to roughly 20 trusted partner organizations. Full ChatGPT rollout for Plus and Pro users is expected within weeks (July 2026).
Q2: Is GPT-5.6 Sol better than Claude Fable 5 for coding?
Sol leads on TerminalBench 2.1 (91.9% vs Mythos 5's 88%). Fable 5 leads on SWE-Bench Pro, but official GPT-5.6 SWE-Bench scores have not been published yet. Sol offers comparable or better performance at a lower price.
Q3: What is Ultra mode in GPT-5.6 Sol?
Ultra mode deploys multiple AI subagents that work in parallel on different parts of a task, then synthesize a unified result. It significantly boosts performance on complex tasks but uses considerably more tokens.
Q4: Why is GPT-5.6 restricted?
Following Trump's June 2 executive order, the U.S. government (via White House, OSTP, and ONCD) requested OpenAI limit access during a security review period. OpenAI complied but publicly stated it opposes this becoming permanent practice.
Q5: How fast will GPT-5.6 be on Cerebras?
Up to 750 tokens per second—roughly 5–15× faster than most current frontier models. Launching July 2026 for select enterprise customers.
Q6: What is the GPT-5.6 context window size?
Reported at approximately 1.5 million tokens, up from GPT-5.5's 1 million. Official confirmation expected with the full system card release.
Q7: Are all three GPT-5.6 models safe for cybersecurity work?
All three carry OpenAI's "High" cybersecurity risk rating. OpenAI built layered safeguards including real-time classifiers and 700k GPU hours of red-teaming, and confirmed models cannot autonomously build complete functional exploits.
Q8: Sol, Terra, or Luna—which should I use?
Sol for complex agents and security research; Terra for high-volume business workloads at GPT-5.5-class quality; Luna for summarization, drafting, and millions of lightweight daily API calls.

References and further reading:

06 Closing strategy and production environment guidance

GPT-5.6 represents a breakthrough on three axes: capability (Sol's Ultra multi-agent mode tops TerminalBench and dethroned Mythos 5 after just 17 days), efficiency (ExploitBench parity at one-third the token cost), and speed (750 token/s on Cerebras reshaping real-time agent UX). It also sets a precedent—the U.S. government formally intervened in a frontier model release for the first time, and that tension between national security review and open access will shape every major drop ahead.

For teams deploying production-grade coding agents today, cloud API access alone does not solve three hidden costs: shared VPS overselling causing long-connection jitter, API unit pricing swinging with capex cycles, and multi-agent pipelines lacking stable 7×24 Mac hosts for local gateways, MCP server clusters, and Codex routing. Sol on Cerebras will be fast—but your agent orchestration layer, test harnesses, and fallback routing still need dedicated, low-jitter edge compute.

For production environments running coding agents, local inference gateways, or MCP server clusters, JEXCLOUD multi-region bare-metal Mac nodes offer a better fit: dedicated Apple Silicon unified memory, no overselling jitter, launchd-resident agent gateways, and 120-second delivery. See nodes and pricing on the JEXCLOUD pricing page.