Grok 4.20 Statistics 2026 | Accuracy, Benchmarks & Facts

Grok 4.20 Statistics 2026 | Accuracy, Benchmarks & Facts

  • Post category:Tech

Grok 4.20 in 2026

Few AI models in recent memory have arrived with as much momentum — or as much controversy — as Grok 4.20. Built by xAI, Elon Musk’s artificial intelligence company, Grok 4.20 was officially launched as a beta on February 17, 2026, and rapidly iterated into Grok 4.20 0309 v2 by April 7, 2026, cementing its place as the flagship reasoning model in the Grok 4 series. Under the hood, it introduces a multi-agent architecture in which four specialized agents collaborate in real time to debate, fact-check, and refine answers — a structural shift that fundamentally separates it from the single-pass inference models that dominated earlier generations. Trained on xAI’s Colossus supercluster, which now houses 555,000 NVIDIA GPUs at an estimated cost of $18 billion, Grok 4.20 operates with a 2 million token context window and achieves output speeds that sit meaningfully above the industry average for reasoning models at its price tier.

What makes Grok 4.20 globally significant in 2026 is not just its raw benchmark performance — it is the scale and speed at which it has captured real-world usage. Grok.com recorded 314 million monthly visits in January 2026, making it the third most-visited generative AI platform on the planet, behind only ChatGPT and Google Gemini. The broader xAI platform crossed 64 million monthly active users, and the company completed a landmark $20 billion Series E round in January 2026 at a $230 billion valuation — followed by an even more dramatic corporate transformation when SpaceX acquired xAI in February 2026, valuing the combined entity at $1.25 trillion. Against that backdrop, understanding Grok 4.20’s actual benchmark scores, pricing, traffic profile, and competitive position gives any marketer, developer, or enterprise decision-maker a sharper picture of exactly where this model sits in the global AI landscape today.

Interesting Facts About Grok 4.20 in 2026

  GROK 4.20 AT A GLANCE — KEY FACTS SNAPSHOT (2026)
  ══════════════════════════════════════════════════════════════
  Release Date (v2)          April 7, 2026
  Developer                  xAI (Elon Musk)
  Context Window             ██████████████████████████████  2,000,000 tokens
  Intelligence Index Score   ████████████████████████████    49 / 100 (Artificial Analysis)
  Output Speed               ████████████████████████████████  106.1 tokens/second
  Input Pricing              $2.00 / 1M tokens
  Output Pricing             $5.00 / 1M tokens
  Global Monthly Visits      ████████████████████████████████████████  314M (Jan 2026)
  Monthly Active Users       ██████████████████████████████████  64 Million
  xAI Valuation (Jan 2026)   ████████████████████████████████████████  $230 Billion
  ══════════════════════════════════════════════════════════════
Fact Detail
Model Name & Version Grok 4.20 (latest: 0309 v2, released April 7, 2026)
Developer xAI (founded 2023 by Elon Musk)
Model Architecture Multi-agent: 4 specialized agents collaborating in real time for fact-checking and reasoning
Context Window 2,000,000 tokens — largest among mainstream frontier models
Artificial Analysis Intelligence Index 49 / 100 — well above peer median of 35
Output Speed (API) 106.1 tokens/second — above the reasoning model average of 63.3 t/s
Time to First Token (TTFT) 22.31 seconds — higher-end latency typical of deep-reasoning models
Input Pricing $2.00 per 1M tokens
Output Pricing (API) $6.00 per 1M output tokens (via xAI API)
Benchmark Suite Cost $514.16 to run the full Artificial Analysis Intelligence Index v4.0
Hallucination Rate (Grok 4.1) Reduced ~3x — from ~12% to ~4% in production traffic
Grok 4 AIME 2025 Score 91.7% — mathematics olympiad benchmark
Grok 4 MMLU Score 92.1% — general knowledge across 57 subjects
Grok 4 GPQA Score 87.5% — graduate-level science reasoning
Grok 4 HLE Score 40.0% (Heavy variant: 50.7%) — Humanity’s Last Exam
Grok 4.1 FActScore 97.0% — factual accuracy benchmark
LMArena Elo (Grok 4.1 Thinking) 1,483 Elo — ranked #1 globally, 31 points ahead of nearest non-xAI model
GDPval-AA Elo (Grok 4.20 v2) 1,179 (Grok 4.3 subsequently improved this to 1,500)
Global Rank (Cloudflare Radar 2025) 9th among all generative AI services globally
Colossus GPU Cluster 555,000 NVIDIA GPUs, estimated cost $18 billion
xAI Series E Funding (Jan 2026) $20 billion at a $230 billion valuation
SpaceX–xAI Merger (Feb 2026) Combined valuation: $1.25 trillion

Source: Artificial Analysis (April 2026), xAI official announcements, Business of Apps (2026), SQ Magazine (May 2026), Reuters (Feb 2026), FatJoe (April 2026)

The facts table above tells a story of a model that is pushing two different frontiers simultaneously: raw reasoning capability and sheer global scale. The 2 million token context window — the largest of any mainstream frontier model — gives Grok 4.20 a structural advantage in tasks like analyzing lengthy financial documents, auditing codebases, or processing scientific literature end-to-end without chunking. Combined with the multi-agent architecture where four agents actively debate and fact-check outputs, the model’s 4% production hallucination rate (down from 12% in earlier iterations) makes a compelling case for enterprise reliability. The 106.1 tokens-per-second output speed sitting 67% above the reasoning model average means users are not trading throughput for depth, a tradeoff that historically plagued chain-of-thought heavy models.

The corporate and infrastructure facts are just as striking as the technical ones. The $20 billion Series E in January 2026 — secured at a $230 billion valuation — ranks among the largest single AI funding rounds in history, and the SpaceX acquisition of xAI at a $1.25 trillion combined valuation just weeks later fundamentally rewrote the ownership and strategic context around Grok. With 555,000 NVIDIA GPUs deployed in the Colossus cluster and xAI spending approximately $1 billion per month on infrastructure and training, the resource advantage behind Grok 4.20 is a structural moat that few companies on earth can match. The Grok 4 Heavy variant’s 50.7% score on Humanity’s Last Exam — the benchmark explicitly designed to resist AI saturation — cemented xAI’s claim to frontier-level reasoning in a way that is difficult to dispute.

Grok 4.20 Global Benchmark Performance in 2026

  GROK 4.20 BENCHMARK SCORES — GLOBAL COMPARISON
  ═══════════════════════════════════════════════════════════════
  Benchmark            Grok 4.20    Peer Median    Top Score (2026)
  ───────────────────────────────────────────────────────────────
  Intelligence Index   49           35             53 (Grok 4.3)
  MMLU (General)       92.1%        ~88-90%        ~93-94%
  GPQA (Science)       87.5%        ~70-80%        94.6% (Claude Mythos)
  AIME 2025 (Math)     91.7%        ~60-70%        ~95%+
  HLE (Hard Reasoning) 40.0%        ~15-25%        50.7% (Grok 4 Heavy)
  LiveCodeBench        79.0%        ~50-60%        ~80-85%
  FActScore (Grok 4.1) 97.0%        ~85-90%        97.0% (Grok 4.1)
  LMArena Elo (4.1 T.) 1,483        ~1,300-1,400   1,483 (Grok 4.1 T.)
  ═══════════════════════════════════════════════════════════════
  T. = Thinking variant
Benchmark Grok 4.20 Score What It Measures Context vs. Global Peers
Artificial Analysis Intelligence Index 49 / 100 Composite: reasoning, knowledge, math, coding 40% above peer median of 35
MMLU (General Knowledge) 92.1% 57 academic subjects, general knowledge Top-tier; benchmark near saturation at frontier
GPQA Diamond (Science) 87.5% Graduate-level biology, physics, chemistry Above average; top score 94.6% (Claude Mythos)
AIME 2025 (Mathematics) 91.7% Olympiad-level math problems Exceptional; human top competitors ~90-95%
HMMT25 90.0% Harvard-MIT Math Tournament (2025) Among highest globally
LiveCodeBench (Coding) 79.0% Real-world competitive programming Above the ~50–60% peer average
Humanity’s Last Exam (HLE) 40.0% (Grok 4) / 50.7% (Heavy) Multi-domain expert reasoning — hardest benchmark First model to score 50% on HLE (Heavy variant)
FActScore (Grok 4.1) 97.0% Factual accuracy in free-form generation Industry-leading factual reliability
LMArena Elo (Grok 4.1 Thinking) 1,483 Human-preference chat arena #1 globally, 31 Elo points clear of nearest rival
GDPval-AA (Grok 4.20 v2) 1,179 Elo Real-world agentic tasks Exceeded by Grok 4.3’s 1,500 Elo in April 2026
ARC-AGI V2 (Grok 4) 15.9% Abstract visual reasoning — AGI proxy Nearly doubled prior record (~8.6%)
USAMO 2025 (Grok 4 Heavy) 61.9% US Math Olympiad — proof-based problems #1 globally on this benchmark
Vending-Bench (Agentic) $4,694 net worth Autonomous multi-step agentic task simulation Vastly outperforms human baseline ($844)

Source: Artificial Analysis (April 2026), xAI official Grok 4 launch page (July 2025), SQ Magazine Grok AI Statistics (May 2026), LLM Benchmarks 2026 (iternal.ai)

Looking at Grok 4.20’s benchmark profile globally, the model’s clearest strength lies in mathematical reasoning and hard multi-domain problems — precisely the areas where the gap between frontier models and everything else remains widest. The 91.7% AIME 2025 score and 90.0% HMMT25 place it in elite company, while the Grok 4 Heavy variant’s 50.7% on Humanity’s Last Exam was a landmark: the first AI model in history to cross the 50% threshold on a benchmark deliberately engineered to resist saturation by advanced AI. The 97.0% FActScore for Grok 4.1 — and the production hallucination rate drop from 12% to 4% — are the kinds of reliability improvements that matter most to enterprises deploying AI in regulated or accuracy-critical workflows.

Where Grok 4.20 sits more modestly is in the composite Artificial Analysis Intelligence Index score of 49, which, while 40% above the peer median of 35, is still 4 points behind Grok 4.3’s score of 53 released just weeks later in April 2026. This rapid iteration — from Grok 4.20 to Grok 4.20 v2 to Grok 4.3 within weeks — underscores that benchmark leadership in 2026 is a moving target, and Grok’s own version succession has been aggressive even by the accelerated standards of the current AI race. The GPQA Diamond score of 87.5% — while impressive in absolute terms — also sits below the 94.6% achieved by Claude Mythos Preview, signaling that scientific reasoning at the very frontier is still a competitive space where no single model dominates cleanly.

Grok 4.20 Global Traffic & User Growth Statistics in 2026

  GROK.COM MONTHLY VISITS — GLOBAL GROWTH TRAJECTORY
  ═══════════════════════════════════════════════════════════════
  Aug 2025   ████████████████                  ~140-150M visits
  Nov 2025   ████████████████████████          234.4M visits   (+14% MoM)
  Jan 2026   ████████████████████████████████  314M visits     (record high)
  Feb 2026   ███████████████████████████████   298.6M visits   (-4.9% MoM)
  Mar 2026   ████████████████████████████████████ 326.3M visits (NEW RECORD +9.3%)
  ═══════════════════════════════════════════════════════════════
  Year-over-Year Growth (Mar 2025 → Mar 2026): +61.03%
Traffic Metric Data Point Period / Source
Monthly Web Visits (Record) 326.3 million March 2026 — all-time high
Monthly Web Visits (Jan 2026) 314 million January 2026 (Similarweb / Forbes)
Monthly Web Visits (Feb 2026) 298.6 million February 2026 (Similarweb)
Month-over-Month Change (Feb→Mar) +9.3% March 2026
Year-over-Year Change (Mar 2025→2026) +61.03% March 2026
Global Website Rank 53rd globally February 2026 (Similarweb)
Avg. Visit Duration 12 min 57 sec February 2026 (Similarweb)
Avg. Pages Per Visit 21.41 February 2026
Bounce Rate 26.48% February 2026
Desktop vs. Mobile Split 78.62% desktop / 21.38% mobile 2026
Direct Traffic Share 72.93% – 78.37% 2026 (Similarweb)
Daily Queries Processed ~134 million 2026 (humanizeai.io)
Global Rank — Generative AI Services 9th (new entry) Cloudflare Radar 2025 Year in Review
Platform Global Rank vs. Competitors 3rd (after ChatGPT, Gemini) January 2026

Source: Similarweb via Business of Apps, FatJoe (April 2026), HumanizeAI.io (April 2026), Cloudflare Radar 2025

The traffic data for Grok.com in 2026 paints a picture of a platform that has moved decisively from niche AI curiosity to mainstream consumer destination. The 326.3 million visits in March 2026 — a 61% year-over-year increase and an all-time record — is a number that most standalone AI platforms would be proud to carry for their entire existence, let alone a product barely two-and-a-half years old. The 12 minutes and 57 seconds average session duration is particularly telling: it places Grok among the most deeply engaging AI products globally, ahead of competitors like Gemini, and suggests users are not dropping in for a quick search-engine-style query but returning for sustained, substantive interactions.

The 26.48% bounce rate — one of the lowest in the generative AI category — reinforces this pattern of high-intent usage. For context, industry-standard acceptable bounce rates for web applications hover around 40–60%; Grok’s near-27% figure implies that roughly 73 out of every 100 visitors engage meaningfully beyond the landing page. The 78.62% desktop dominance is also notable: it suggests Grok’s primary use case in 2026 is still driven by professional, developer, and research workflows conducted on larger screens — not casual mobile scrolling. The 21.41 pages per visit further confirms this depth, indicating users are navigating across features, switching models, or running multi-step workflows rather than asking a single question and leaving.

Grok 4.20 Global User Base & Market Share Statistics in 2026

  GROK GLOBAL TRAFFIC SHARE BY COUNTRY (2026)
  ════════════════════════════════════════════════════════
  United States  ████████████████████████  21–24% of visits
  India          ████████████              8–10%
  Brazil         ████████                  4–5%
  South Korea    █████                     3.5%
  Vietnam        ████                      4%
  Hong Kong      ███                       3.1%
  UK             ████                      5.5%
  Pakistan       ████                      5.4%
  Other Markets  █████████████████████     ~50%+
  ════════════════════════════════════════════════════════
User / Market Metric Figure Source / Period
Global Monthly Active Users ~60–64 million January 2026 (xAI internal, Business of Apps)
Total App Downloads (All Time) ~100 million 2026 (Business of Apps)
Google Play Store Downloads 50 million+ 2026 (Google Play)
iOS Daily Downloads (Post Grok 4 Launch) 197,000 / day (+279%) July 11, 2025 (App Store data)
Top Traffic Country United States (21–24% of visits) 2026 (Similarweb)
2nd Largest Market India (~8–10%) 2026
Fastest-Growing Markets India (+42% MoM), Brazil, Vietnam March 2026
Gender Split 60.19% male / 39.81% female 2026
Largest Age Group 25–34 years (51.4% of all users) 2026
Second Largest Age Group 18–24 years (12.2%) 2026
Average Session Time (Mobile) 4 min 58 sec 2026 (HumanizeAI.io)
Coding-Related Queries 18–25% of all queries 2026
News & Trends Queries ~30% of all queries 2026
Developer Engagement Growth +50% year-over-year 2026
Daily Image Generation (Jan 2026) Grok Imagine: 1.245 billion videos January 2026

Source: Business of Apps (March 2026), SEOProfy (Feb 2026), FatJoe (April 2026), Bayelsawatch (April 2026), HumanizeAI.io (April 2026)

The global user profile of Grok in 2026 reveals a platform with genuine international reach that extends well beyond its American origins. While the United States leads with 21–24% of traffic, the combined share of India (~10%), Brazil (~5%), Vietnam (~4%), and the UK (~5.5%) means that more than half of Grok’s active user base sits outside the US — a meaningful shift from the platform’s early profile as primarily a product of and for the Anglophone West. India’s 42% month-over-month traffic increase in March 2026 is the most striking single geographic signal: it points to an emerging market uptake curve that, if sustained, could reshape the platform’s global center of gravity within 12–18 months.

The demographic concentration in the 25–34 age cohort (51.4% of all users) reflects a user base that is overwhelmingly composed of young professionals, developers, researchers, and early adopters — exactly the audience that drives enterprise AI adoption cycles. The 18–25% coding query share and 50% year-over-year developer engagement growth confirm that Grok is establishing a real foothold in the developer toolchain, not just as a consumer chatbot. The 1.245 billion videos generated by Grok Imagine in January 2026 alone also underscores that multimodal usage — particularly AI-generated video — has become a genuinely mainstream behavior on the platform, with scale numbers that rival or exceed dedicated image generation services.

Grok 4.20 Global Revenue & Funding Statistics in 2026

  xAI FUNDING ROUNDS — CUMULATIVE CAPITAL RAISED
  ════════════════════════════════════════════════════════
  Series A (Nov 2023)  ██  $134.7M  @ $673M valuation
  Series B (May 2024)  █████  $6B   @ $24B valuation
  Series C (Dec 2024)  ████████  $6B  @ $50B valuation
  Series D (Jun 2025)  █████████████  $10B ($5B debt + $5B equity)
  Series E (Jan 2026)  ████████████████████████████████████  $20B @ $230B valuation
  ════════════════════════════════════════════════════════
  Total Raised: $42+ Billion | SpaceX Merger Valuation: $1.25 Trillion
Revenue / Funding Metric Figure Source / Period
xAI Total Funding Raised $42+ billion All rounds through 2026
Series E Round (Jan 2026) $20 billion at $230 billion valuation Reuters, January 2026
xAI Q3 2025 Revenue $107 million (quarter ended Sept 30, 2025) Reuters
xAI Q3 2025 Net Loss $1.46 billion Reuters
Grok 2025 Full-Year Revenue Estimate ~$300–$350 million Business of Apps, 2026
Projected 2026 Revenue ~$2 billion Business of Apps (2026)
Monthly Infrastructure Spend (xAI) ~$1 billion/month 2026 estimates
SuperGrok Subscription Price $30/month or $300/year xAI official pricing
SuperGrok Heavy Price $300/month xAI official pricing
Premium+ (X Platform) Price $40/month or $395/year xAI official pricing
Grok Business (per seat) $30/seat/month xAI official pricing
API Input Pricing (Grok 4.20) $2.00 per 1M tokens xAI API (Artificial Analysis)
API Output Pricing (Grok 4.20) $6.00 per 1M tokens xAI API
US DoD AI Contract (xAI) Up to $200 million July 2025
SpaceX–xAI Merger Valuation $1.25 trillion (all-stock, Feb 2026) Reuters, February 2026
App Monthly Revenue (Mar 2026 est.) ~$12 million/month Sensor Tower estimate

Source: Reuters (Jan & Feb 2026), Business of Apps (March 2026), xAI official pricing, SQ Magazine (May 2026), Bayelsawatch (April 2026)

The revenue and funding trajectory of xAI is one of the most striking capital stories in the technology industry in 2026. Moving from a $134.7 million Series A in November 2023 to a $20 billion Series E in January 2026 — a period of just 26 months — represents a valuation increase of more than 340x, making it one of the fastest valuation climbs in startup history. Yet the $1.46 billion net loss in Q3 2025 alone, against $107 million in quarterly revenue, makes clear that xAI is still deep in the investment phase of its growth curve, spending on the Colossus cluster expansion, model training runs, and infrastructure at approximately $1 billion per month. The projected $2 billion in 2026 revenue — primarily driven by SuperGrok subscriptions, X Premium bundles, API revenue, and the $200 million DoD contract — would represent a roughly 6x year-over-year increase, but still a long way from break-even.

The $1.25 trillion combined valuation of the SpaceX–xAI merger in February 2026 is the corporate event that most profoundly changes the long-term context for Grok’s global trajectory. Elon Musk cited “orbital data centres” as the strategic rationale — the vision of deploying AI compute infrastructure in space via SpaceX’s Starship and Starlink networks. If that infrastructure ambition materializes, it would give Grok a distribution channel and compute substrate that no other AI company on earth currently possesses. The $300/month SuperGrok Heavy tier and $2.00 input / $6.00 output API pricing also position Grok 4.20 as a product that is priced at the premium end for general consumers but significantly below competing frontier models for developers — a deliberate commercial strategy that reflects xAI’s goal of capturing the developer ecosystem as its primary long-term revenue base.

Grok 4.20 Model Versions & Release Timeline Global in 2026

  GROK MODEL EVOLUTION — RELEASE TIMELINE
  ════════════════════════════════════════════════════════════════
  Nov 2023  ██  Grok 1      — 314B params, Apache 2.0 open-source
  Mar 2024  ██  Grok 1.5    — 128K context window, open-sourced
  Aug 2024  ████  Grok 2    — Image generation added
  Feb 2025  ████████  Grok 3  — 10x compute vs predecessors
  Jul 2025  ████████████  Grok 4  — HLE 50.7% (Heavy), Colossus 200K GPU training
  Nov 2025  ██████████████  Grok 4.1 — 97% FActScore, 1,483 LMArena Elo (#1)
  Feb 2026  ████████████████  Grok 4.20 Beta — Multi-agent architecture launch
  Apr 2026  ████████████████████  Grok 4.20 0309 v2 — Speed & accuracy refinements
  Apr 2026  ████████████████████████  Grok 4.3 — Intelligence Index 53; $395 run cost
  ════════════════════════════════════════════════════════════════
  Total: 7 major versions in ~2.5 years | Avg: ~3 releases per year
Model Release Date Key Capability Added Context Window
Grok 1 November 3, 2023 Base LLM; Python/Rust architecture; 314B parameters Standard
Grok 1.5 March 28, 2024 128,000-token context; enhanced reasoning 128K
Grok 2 August 14, 2024 Image generation capabilities added 128K
Grok 3 February 17, 2025 10x compute vs. prior models; drove +436% traffic spike 128K
Grok 4 July 9, 2025 HLE 50.7% (Heavy); ARC-AGI V2 15.9%; native tool use; real-time search 256K
Grok 4.1 November 17, 2025 97% FActScore; hallucinations cut 3x; #1 LMArena Elo (1,483) 2M
Grok 4.20 Beta February 17, 2026 Multi-agent architecture (4 agents); ultra-precise answers 2M
Grok 4.20 0309 v2 April 7, 2026 Speed, hallucination fixes, better LaTeX, multi-image rendering 2M
Grok 4.3 April 30, 2026 Intelligence Index 53; GDPval-AA 1,500 Elo; 20% lower benchmark cost 1M

Source: SQ Magazine (May 2026), xAI official, Artificial Analysis (April 2026), Business of Apps (March 2026)

The Grok model release timeline is arguably the clearest expression of xAI’s competitive philosophy: move faster than anyone else and iterate in public. Reaching 7 major model versions in approximately 2.5 years — an average of nearly 3 major releases per year — is a cadence that exceeds most frontier AI labs, including OpenAI and Anthropic, who have historically favored longer release cycles. The leap from Grok 3’s 10x compute training jump in February 2025 to Grok 4’s landmark HLE 50.7% score just five months later in July 2025 shows how rapidly Colossus-scale compute investment translates into benchmark breakthroughs. And the jump from 128,000 tokens (Grok 3) to 2,000,000 tokens (Grok 4.1 and 4.20) is not an incremental context expansion — it is an order-of-magnitude change that unlocks entirely different enterprise use cases.

The Grok 4.20 multi-agent architecture — launched February 2026 — represents the most architecturally significant change in the series since Grok 4’s reinforcement learning training methodology. Rather than a single model responding to a query, four specialized agents debate and fact-check in real time, a design that directly targets the hallucination and reliability issues that have been the main enterprise objection to deploying large language models in production. The subsequent 0309 v2 release on April 7, 2026, focused specifically on speed, LaTeX rendering quality, image handling, and instruction-following, signals that xAI is now iterating on polish and reliability rather than purely on benchmark-maximizing capability — a sign of a model transitioning from research showcase to production product.

Grok 4.20 Global Competitive Position & Pricing in 2026

  INTELLIGENCE INDEX vs. COST — FRONTIER MODEL COMPARISON (Apr 2026)
  ════════════════════════════════════════════════════════════════
  Model               Intelligence   Output Price    Speed (t/s)
                      Index Score    (per 1M tokens)
  ────────────────────────────────────────────────────────────────
  Grok 4.3            53             $2.50           83.3 t/s
  Grok 4.20 0309 v2   49             $6.00           106.1 t/s  ← FASTEST
  Peer Median         35-36          $8.00 (median)  63 t/s
  ════════════════════════════════════════════════════════════════
  Grok 4.20 costs $514 to run full benchmark suite
  Grok 4.3 costs $395 to run full benchmark suite (-23%)
Model Intelligence Index Input Price (1M tokens) Output Price (1M tokens) Context Window Speed (t/s)
Grok 4.3 (Apr 2026) 53 $1.25 $2.50 1M 83.3
Grok 4.20 0309 v2 49 $2.00 $6.00 2M 106.1
Peer Reasoning Median ~35–36 ~$1.68–1.71 ~$8.00 Varies ~61–63
Grok 4.20 vs. Median +40% above Slightly above avg Below avg output cost Largest (2M) +67% faster

Source: Artificial Analysis (April 2026)

When benchmarked against the global field of frontier reasoning models, Grok 4.20 0309 v2 occupies a notable position: above-average intelligence at above-average speed, with output pricing that is below the peer median despite input pricing that is slightly elevated. The $6.00 per million output tokens — compared to the peer median of $8.00 — makes Grok 4.20 approximately 25% cheaper on the output side than the average competing model at its intelligence tier, which is the pricing dimension that matters most for high-volume API users generating large responses. Its 106.1 tokens-per-second output speed, the highest among the models evaluated, means that at scale, Grok 4.20 is not just cheaper per token — it delivers those tokens 67% faster than the average competing reasoning model.

The comparison between Grok 4.20 and its own successor Grok 4.3 is equally instructive. Grok 4.3 scores 53 on the Intelligence Index (four points higher) and costs $395 to run the benchmark suite versus Grok 4.20’s $514 — a 23% cost reduction alongside a capability improvement. This pattern of xAI releasing more capable models at lower cost within weeks of each other is a deliberate competitive signal: the company is actively trying to undercut the “frontier models are expensive” narrative that has slowed enterprise AI adoption. For businesses evaluating which model to deploy in production, the 2 million token context window that Grok 4.20 retains — vs. Grok 4.3’s 1 million tokens — means there is still a genuine use-case argument for 4.20 even after 4.3’s release, particularly for long-document or full-codebase analysis tasks.

Disclaimer: The data research report we present here is based on information found from various sources. We are not liable for any financial loss, errors, or damages of any kind that may result from the use of the information herein. We acknowledge that though we try to report accurately, we cannot verify the absolute facts of everything that has been represented.

📩Subscribe to Our Newsletter

Get must-read Data Reports, Global Insights, and Trend Analysis — delivered directly to your inbox.