May 2026: AI Enters the Infrastructure Era — From Model Races to Engineering Wars

TL;DR

## The End of an Era: Model Competition's Marginal Returns Are Approaching Zero
The AI industry in May 2026 presents a paradoxical picture: technological breakthroughs have never stopped, yet the commercial value of being the 'best model' is depreciating faster than ever.
:::highlight
:::
## Enterprise Deployment: The New Front Line in the AI War

The End of an Era: Model Competition's Marginal Returns Are Approaching Zero

The AI industry in May 2026 presents a paradoxical picture: technological breakthroughs have never stopped, yet the commercial value of being the 'best model' is depreciating faster than ever.

GPT-5.5, Claude Sonnet 4.6, Gemini 3.5 Flash, DeepSeek v4 — within less than a month, virtually every major player shipped significant updates. But unlike 2023-2024, each new model release is no longer a 'crushing the competition' moment. It's a 'catching up' statement. Model capability convergence has become an irreversible structural trend.

IBM Granite 4.1 achieves performance comparable to 32B MoE models with only 8B parameters. DeepSeek v4's API costs are one-third of GPT-5.5. Parameter count is no longer a moat — data quality and training efficiency are what matter.
:::

The more critical shift is in the market. Anthropic shipped nearly 20 major product updates in 12 weeks, growing annualized revenue from under $9 billion to over $30 billion, and valuation from $61.5 billion to $900 billion — surpassing OpenAI's $852 billion private valuation. Claude Opus 4.7 built a reputation among enterprise developers not through benchmark scores, but through real-world performance in long-context reasoning and code generation.

Meanwhile, OpenAI filed its IPO paperwork targeting a trillion-dollar valuation. Google's Gemini 3.5 Flash, announced at I/O 2026, costs 1/15th of GPT-5.5 for inference while matching or exceeding GPT-5.5 on agent workflow benchmarks.

These signals converge on one reality: model capability itself is transitioning from a competitive advantage to a qualifying threshold. When everyone has access to near-frontier capability, true differentiation lies not in the model, but in the system built around it.

## Enterprise Deployment: The New Front Line in the AI War

If 2024-2025 was defined by 'model training,' 2026 is defined by 'enterprise deployment.' Two stories from May brought this shift into sharp focus.

First, KPMG deployed Claude to 276,000 employees across 138 countries. This is not a pilot — it's a firm-wide, full-workflow integration. Claude is embedded in KPMG's Digital Gateway platform, becoming a standard tool for every consultant. Combined with similar deals at PwC and Deloitte, three of the Big Four accounting firms have chosen Anthropic as their enterprise AI partner within 60 days.

Second, OpenAI launched DeployCo — a $4 billion consulting subsidiary backed by a 19-firm consortium led by TPG, including Goldman Sachs, Bain Capital, McKinsey, and Capgemini. DeployCo operates on a Palantir-style model: rather than selling licenses and leaving integration to customers, it places Forward Deployed Engineers directly inside client organizations to build and operate production AI systems. Through the acquisition of Edinburgh-based Tomoro, DeployCo launched with 150 FDEs on day one.

Both stories share the same insight: model performance is no longer the primary bottleneck for enterprise adoption. Integration into messy real-world systems, change management, evaluation frameworks, and security review — these human-intensive engineering tasks — are the actual constraints. ::: This insight is reshaping AI revenue structures. The company that controls the enterprise deployment layer — the systems, workflows, and organizational relationships through which AI capability reaches end users — will capture more durable revenue than the company that simply supplies the best model API. ## Agents: From Prompt Engineering to Runtime Systems Engineering One of the most profound shifts of 2026 is the evolution of AI Agents from experimental concepts to systems engineering disciplines.

2023: Agent = Prompting tricks + function calling

2024: Agent = Multi-step reasoning + tool use

2025: Agent = Workflow orchestration + state management

2026: Agent = Runtime systems (recoverable, observable, governable, scalable)

::: Over the past two years, the center of gravity for AI Agents has shifted from 'connecting a stronger model to a few functions' to 'placing the model inside a recoverable, observable, governable, and scalable runtime system.' Anthropic shipped 74 product updates in 52 days — most focused on deepening programming and enterprise workflow capabilities: Skills, MCP, Memory, Compaction, Context Editing, Advisor, Managed Agents. Each fills a gap in the Agent runtime stack. Google's Gemini Spark, launched at I/O 2026, is a 24/7 cloud-native personal agent. It runs on dedicated virtual machines, continues working when your phone is locked, integrates natively with Gmail, Google Docs, and Workspace, and has its own dedicated email address. This is not a 'chatbot' — it's a stateful, identity-bearing, persistently running digital employee. OpenAI's Background mode, Sessions, Agents SDK, Tracing, Evals — all building the same complete Agent runtime picture. The industry is converging on a consensus: Agent competition has moved from 'who writes better prompts' to 'who builds better systems.' ## Capital Shift: $587 Billion Poured into Infrastructure The AI capital expenditure numbers for 2026 are staggering. Amazon, Google, Meta, and Microsoft alone are projected to spend $587-670 billion on AI capex — nearly double 2025 levels. What matters more than the scale is where the money is going. In 2023, the bottleneck was training the best model. In 2024, it was GPU supply (H100 shortage). In 2025, inference compute became the new constraint — inference's share of AI compute spending doubled from 33% to 66%. In 2026, the bottleneck sinks further — to electricity and physical infrastructure. Global data center power consumption is projected to exceed 1,000 TWh, approaching Japan's national annual electricity usage. For every $1 spent on AI, approximately $2.3 must be spent on supporting infrastructure. Today's AI arms race is no longer primarily about funding model development — it's about data centers, power supply, and delivery capability.

Anthropic's compute crisis is the most direct example: the model won, revenue hit all-time highs, but infrastructure couldn't keep up, forcing the company to degrade output quality. Leadership itself became a liability. ::: SpaceX's $1.25 billion/month compute contract with Anthropic, and SpaceX's impending IPO at $1.75 trillion valuation with plans to acquire Cursor (valued at $60 billion), are footnotes to this trend — AI infrastructure itself has become a financial asset class. ## The Rise of Small Models: Cost Efficiency Replaces Parameter Races Another critical signal from May 2026 is the collective rise of small-parameter models. DeepSeek v4, through MoE architecture and domestic hardware optimization, compressed costs to one-third of GPT-5.5. IBM Granite 4.1's 8B model matches 32B MoE performance. Google Gemini 3.5 Flash achieves 280+ tokens/s inference at 1/15th the cost of GPT-5.5.

Frontier Models

- GPT-5.5: Highest cost, largest user base - Niche: Complex reasoning, high-value tasks - Use cases: Research, code generation, deep analysis

Efficient Models

- DeepSeek v4 Flash: 1/3 cost of GPT-5.5 - IBM Granite 4.1: 8B rivals 32B MoE - Gemini 3.5 Flash: 280+ tok/s, 1/15 cost - Niche: High throughput, low latency, cost-sensitive ::: This isn't just price competition — it's a fundamental restructuring of AI application economics. When inference costs drop by 1-2 orders of magnitude, previously uneconomical AI use cases become viable. The cost constraints on every Agent tool call and every multi-step reasoning turn are dramatically relaxed, which in turn accelerates Agent adoption. Anthropic's Tool Search documentation shows that multi-service tool definitions can consume approximately 55k tokens, while dynamic on-demand loading typically reduces tool context by 85%+. OpenAI's Prompt Caching can reduce input costs by up to 90% and latency by up to 80%. These optimizations are not at the model level — they're at the system level — further confirming that 'engineering' has replaced 'model capability' as the competitive differentiator. ## The New Open-Source vs. Closed-Source Dynamics The open vs. closed source debate enters a new phase in 2026. DeepSeek v4 and IBM Granite 4.1 demonstrate that open-weight models are rapidly closing the gap with closed-source counterparts. More notably, Chinese AI companies are rising globally. Goldman Sachs ranks ByteDance, Alibaba, and Minimax as China's independent AI first tier. Minimax — the only independent Chinese company with full-stack capabilities across text, image, video, audio, and music — is seen by Morgan Stanley as following the closest technical trajectory to Google's Gemini Omni, with ARR projected to reach $1 billion by end of 2026. Alibaba's Bailian MaaS platform has surpassed RMB 8 billion in ARR, on track to exceed RMB 30 billion by year-end. Chinese LLMs have surpassed the US in global weekly API calls — not just a volume victory, but a structural advantage driven by cost competitiveness. ## Conclusion: From 'Who Is Strongest' to 'Who Lasts Longest' May 2026 marks a structural inflection point for the AI industry. Three simultaneous forces define this moment: 1. **Technical convergence**: Narrowing model capability gaps render 'being the best' strategically meaningless 2. **Competition spillover**: The battlefield expands from models to deployment, engineering, and infrastructure 3. **Capital acceleration**: Trillion-dollar capital expenditures raise expectations and shorten switching cycles These three layers compound to change the rules of survival in AI. What determines whether a company can continue expanding — and who will be forced to decelerate — is increasingly not about benchmark scores, but a more complex set of variables: whether inference capability has achieved a leap, whether Agent capabilities are perceptible, whether the business model is sustainable, and whether the ecosystem moat is high enough. The most forward-looking judgment, therefore, is this: in the next one to two years, the teams most likely to win the market will not necessarily be those that build the 'most capable model.' They will be the teams that design the clearest boundaries between workflow and agent, tool and protocol, context and memory, model and runtime, freedom and governance. AI competition has moved from the era of 'who is strongest' to the era of 'who lasts longest.'

一个时代的终结：模型竞赛的边际效益正在归零

2026年5月的AI行业，呈现出一个看似矛盾的局面：技术突破从未停止，但「最强模型」的商业价值正在加速贬值。

GPT-5.5、Claude Sonnet 4.6、Gemini 3.5 Flash、DeepSeek v4——在不到一个月的时间里，几乎所有头部厂商都推出了重大更新。然而与2023-2024年不同的是，每一次新模型的发布，不再是「碾压对手」的时刻，而是「追平差距」的声明。模型能力的趋同已经成为一个不可逆转的结构性趋势。

IBM Granite 4.1以仅8B参数实现了与32B MoE模型相当的性能，DeepSeek v4的API调用成本仅为GPT-5.5的三分之一。参数规模不再是护城河，数据质量和训练效率才是。
:::

更关键的变化发生在市场层面。Anthropic在12周内完成了近20次重大产品更新，其年化收入从不到90亿美元飙升至300亿美元以上，估值从615亿美元跃升至9000亿美元——超过OpenAI的8520亿美元私有估值。Claude Opus 4.7在企业开发者中建立起口碑，不是因为基准测试分数，而是因为其在长上下文推理和代码生成中的实际表现。

与此同时，OpenAI提交了IPO文件，估值目标直指万亿美元级别。Google I/O 2026上发布的Gemini 3.5 Flash，推理成本仅为GPT-5.5的十五分之一，但Agent工作流性能已超越GPT-5.5。

这些信号共同指向一个事实：模型能力本身正在从「竞争优势」变为「准入门槛」。当每个人都能获得接近顶级的模型能力时，真正的差异化就不在于模型本身，而在于围绕模型构建的系统。

## 企业部署：AI战争的新前线

如果说2024-2025年的关键词是「模型训练」，那么2026年的关键词就是「企业部署」。五月份的两则新闻，将这个转变推到了聚光灯下。

第一则是KPMG将Claude部署到全球276,000名员工，覆盖138个国家。这不是一个试点项目，而是全员、全业务线的深度整合——从税务、审计到咨询，Claude嵌入了KPMG的Digital Gateway平台，成为每一个顾问的标准配置。与PwC、Deloitte的类似合作，意味着四大会计师事务所中有三家已经在60天内选择了Anthropic作为企业AI伙伴。

第二则是OpenAI成立了DeployCo——一个由TPG领投、高盛、贝恩资本、麦肯锡、凯捷等19家机构共同出资40亿美元的咨询子公司。DeployCo采用Palantir式的运营模式：不是销售许可证让客户自己集成，而是派驻前部署工程师（FDE）直接进入客户组织内部，构建和运维生产级AI系统。通过收购爱丁堡的AI咨询公司Tomoro，DeployCo首日即拥有150名FDE。

这两件事的本质是同一个判断：**模型性能不再是企业采用AI的主要瓶颈。集成到混乱的真实系统、变革管理、评估框架、安全审查——这些需要人力的工程工作，才是真正的约束条件。** ::: 这一判断正在重塑AI行业的收入结构。谁控制企业部署层——即AI能力真正触及终端用户的系统、工作流和组织关系——谁就能捕获更持久的收入，而不仅仅是提供最好的模型API。 ## Agent从「提示词技巧」到「运行时工程」 2026年最深刻的变化之一，是AI Agent从实验性概念演变为系统工程学科。

2023: Agent = 提示词技巧 + 函数调用

2024: Agent = 多步骤推理 + 工具使用

2025: Agent = 工作流编排 + 状态管理

2026: Agent = 运行时系统（可恢复、可观测、可治理、可扩展）

::: 过去两年，AI Agent的重心已经明显从「把更强模型接上几个函数」，转向「把模型放进一个可恢复、可观测、可治理、可扩展的运行时系统」。 Anthropic在52天内密集推出了74款产品更新，大部分围绕编程和企业工作流的纵深突破——Skills、MCP、Memory、Compaction、Context Editing、Advisor、Managed Agents，逐一补齐了Agent运行时的关键组件。 Google在I/O 2026上发布的Gemini Spark，则是一个7x24运行的云原生个人Agent——它运行在专用虚拟机上，手机锁屏后仍可继续工作，原生集成Gmail、Google Docs和Workspace，用户甚至可以通过专属邮箱直接给Agent发邮件。这已经不是「聊天机器人」，而是一个有状态、有身份、持续运行的数字员工。 OpenAI的Background mode、Sessions、Agents SDK、Tracing、Evals——同样是在构建Agent运行时的完整拼图。行业正在形成共识：Agent的竞争已经从「谁更会写Prompt」进入「谁更会做系统工程」的阶段。 ## 资本转向：5870亿美元砸向基础设施

2026年的AI资本支出数字令人震惊。仅Amazon、Google、Meta和Microsoft四家，年度AI资本支出预计达到5870亿至6700亿美元，接近2025年的两倍。

更值得注意的不是规模，而是钱花在了哪里。

2023年，瓶颈是训练出最强模型；2024年，瓶颈是GPU供应（H100一卡难求）；2025年，推理算力成为新卡点，推理占AI算力支出的比例从33%翻倍至66%。到了2026年，瓶颈继续下沉——落到电力和物理基建。全球数据中心年耗电量预计突破1,000 TWh，接近日本全国一年的用电量。

每投入1美元做AI，背后大约还要配套2.3美元的基础设施支出。今天的AI军备竞赛，大部分钱已经不是花在模型上，而是花在数据中心、电力供应和交付能力上。

Anthropic今年的算力危机是最直接的例证：模型赢了，收入创了历史新高，但基础设施没跟上，最终被迫下调输出质量。领先本身，反而成为负担。 ::: SpaceX与Anthropic签署的12.5亿美元/月计算合同，以及SpaceX即将以1.75万亿美元估值IPO并计划收购Cursor（估值600亿美元），都是这一趋势的注脚——AI基础设施本身已经成为一种金融资产类别。 ## 小模型的逆袭：成本效率取代参数竞赛 2026年5月的另一个重要信号，是小参数模型的集体崛起。 DeepSeek v4通过MoE架构和国产算力适配，将成本压缩到GPT-5.5的三分之一。IBM Granite 4.1的8B参数模型实现了与32B MoE模型相当的性能。Google Gemini 3.5 Flash的推理速度超过280 tokens/s，成本仅为GPT-5.5的十五分之一。

大模型

- GPT-5.5：成本最高，但用户基数最大 - 生态位：复杂推理、高价值任务 - 适用场景：科研、代码生成、深度分析

小模型

- DeepSeek v4 Flash：成本为GPT-5.5的1/3 - IBM Granite 4.1：8B参数媲美32B MoE - Gemini 3.5 Flash：280+ tokens/s，成本1/15 - 生态位：高吞吐、低延迟、成本敏感 ::: 这不仅仅是价格竞争，而是AI应用经济学的一次根本性重构。当推理成本下降1-2个数量级，原本不经济的AI应用场景变得可行。Agent的每次工具调用、每轮多步骤推理的成本约束被大幅放松，这反过来又加速了Agent应用的普及。 Anthropic的Tool Search文档显示，多服务工具定义可轻易消耗约55k tokens，而动态按需加载通常可减少85%以上的工具上下文。OpenAI的Prompt Caching可将输入成本最多降低90%、延迟最多降低80%。这些优化不是模型层面的，而是系统层面的——进一步印证了「工程化」正在取代「模型能力」成为竞争焦点。 ## 开源与闭源的博弈新阶段开闭源之争在2026年也进入了新阶段。DeepSeek v4和IBM Granite 4.1证明，开源/开放权重模型正在快速缩小与闭源模型的差距。更值得关注的是，中国模型厂商在全球市场的崛起。高盛将字节、阿里、Minimax并列为中国独立AI厂商的第一梯队。Minimax作为国内唯一同时具备「文本+图像+视频+音频+音乐」全栈能力的独立厂商，被摩根士丹利认为其全模态路线最接近Google Gemini Omni的方向，ARR有望在2026年底达到10亿美元。阿里巴巴的百炼MaaS平台ARR已突破80亿元人民币，年底有望突破300亿元。中国大模型的全球周调用量已超越美国——这不仅是数量的超越，更是成本竞争力带来的结构性优势。 ## 结语：从「谁最强」到「谁最持久」 2026年5月，AI行业进入了一个结构性转折点。三件同时发生的事定义了这一刻： 1. **技术趋同**：模型能力差距缩小，让「最强」失去决定性意义 2. **竞争外溢**：战场从模型扩展到部署、工程化、基础设施，让「统治」失去稳定基础 3. **资本加速**：万亿美元级别的资本支出，让市场预期变得更高、切换更快这三层变化叠加在一起，意味着AI行业的生存法则已经改变。决定一家公司能否继续扩张、谁会被迫减速的，越来越不只是模型跑分，而是一组更复杂的变量：推理能力有没有跃迁、Agent能力是否可感知、商业模式能否可持续、生态壁垒是否足够高。因此，最具前瞻性的判断是：未来一到两年里，最有可能赢得市场的团队，不一定是做出「最强大模型」的团队，而更可能是那些把workflow与agent的边界、工具与协议的边界、上下文与记忆的边界、模型与运行时的边界、自由与治理的边界设计得最清楚的团队。 AI的竞争，已经从「谁最强」的时代，走向「谁最持久」的时代。

LLM AI Agent Deployment Funding Inference Open Source Product Launch

← Deep Analysis

The End of an Era: Model Competition's Marginal Returns Are Approaching Zero

Frontier Models

Efficient Models

一个时代的终结：模型竞赛的边际效益正在归零

大模型

小模型

Share to WeChat 分享到微信

Related Articles 相关文章