36Kr Exclusive | Four Key Propositions for ByteDance AI in 2026

ByteDance AI set four ambitious goals for itself in 2026, and the most intriguing among them is the one that entered last, is catching up the fastest, and may hold the key to the future: the world model. When Wu Yonghong declared at the Seed all-hands meeting, "We must match Google Genie 3 by the end of the year," the atmosphere in the room likely carried not only ambition but also a hint of urgency—like being forced to catch up on missed lessons. Internal evaluations show a 10% performance gap

Hot

Quality

Impact

Analysis 深度分析

ByteDance’s AI matrix was once praised for having "no obvious weaknesses"—Seed 2.0, Seedance 2.0, Doubao’s 200 million daily active users... the report card was envy-worthy. But the gap in world models exposed the short-term utilitarianism of its technical approach. In 2024, when Zhou Chang took the lead, internal judgment was to "wait for clearer scenarios and first focus on video models"—a classic big-company mindset: do what’s trending, invest in what can scale quickly. It wasn’t until 2025 that a team was set up to explore VLA (Vision-Language-Action), and in early 2026, the paths were merged, and tens of millions of yuan were poured into data budgets. This isn’t strategic foresight—it’s a panicked "first move" after seeing Google and OpenAI outline their presence in the embodied intelligence space. The data investment is reportedly 3-4 times that of other competitors, bringing back the familiar "data flooding tactics." The problem is, the core of a world model lies in understanding the dynamic logic of the physical world—it can’t be built simply by piling on more video data. This "brute force for miracles" path dependency reveals confusion at the foundational cognitive level.

What’s even more intriguing is the awkwardness of the Coding business. Despite investment "second only to the world model," it remains low-profile, and internal products refuse to use their own Seed-Code. The reason is blunt: the model’s capabilities are lacking, so business units opt for DeepSeek or Claude externally. As a result, real feedback data doesn’t flow back, making the model even harder to improve—a perfect death spiral. It wasn’t until 2026 when application departments were mandated to use Seed models that a basic closed loop was formed. This kind of推行技术产品 through administrative orders feels quite ironic in a tech-driven company. Coding should have been the foundation of Agent capabilities, a whetstone for honing the model’s logical reasoning—but now it’s become an experimental field for internal political economics. When your "dogfooding" (internal product testing) requires enforcement to happen, it’s a sign that your product isn’t strong enough to win voluntary adoption.

As for Seedance, which holds the SOTA (state-of-the-art) position, its success is summarized as "a victory of data"—a 2,000-person evaluation team, massive training datasets. This remains ByteDance’s signature scaling play. But the next frontier in video generation is "dynamic generation," which involves understanding motion physics and long-term consistency. This can no longer be solved by simply throwing more data at it. At the inflection point where generative AI moves from "perception" to "action," ByteDance still seems to trust the familiar scale effects over a disruptive restructuring of cognitive architecture.

Overall, ByteDance’s AI strategy in 2026 resembles a carefully calculated makeup exam: using the most abundant funds and the densest talent to quickly fill the gaps left by earlier strategic wavering. The "gamble" on world models is an acknowledgment of the uncertainty in future tracks, choosing to cover risks with resources. The forced data return for Coding is an administrative correction after recognizing the failure of internal collaboration. The maintenance of video models is a continuation of past path dependencies. ByteDance certainly has the potential to achieve results through sheer money and manpower, but the real challenge lies ahead: as AI competition shifts from "who runs faster" to "who thinks deeper," can ByteDance—which is accustomed to winning through iteration speed and engineering scale—still find that narrow, true technological gateway that cannot be crushed by resources? Chasing SOTA is important, but if you’re merely running after your opponent’s shadow, even if you reach the same line, you may have long strayed from the true endgame.

字节AI在2026年给自己立了四个Flag，其中最耐人寻味的，是那个入场最晚、追赶最急、却可能押注未来的——世界模型。当吴永辉在Seed全员会上喊出“年底前对标Google Genie 3”时，会议室里回荡的恐怕不只是雄心，还有几分被迫补课的紧迫。内部评测显示性能差距仍有10%，吴老板多次直言“不及预期”，这画面感，像极了一个优等生突然发现漏学了决定未来的主科，只好一边翻课本一边给自己打气：还来得及，还来得及。

字节的AI矩阵曾被夸“没有明显短板”，Seed 2.0、Seedance 2.0、豆包DAU两亿……成绩单漂亮得让人嫉妒。但世界模型这一块空白，暴露了其技术路线的短期功利性。2024年周畅扛旗时，内部判断“场景不明、先打视频模型的仗”，典型的大厂思维：什么火做什么，什么能快速上量投什么。直到2025年才成立小组探索VLA，2026年初合并路线、狂砸数千万元数据预算——这哪里是战略远见，分明是看见Google和OpenAI在具身智能领域跑出轮廓后，才慌慌张张补上的“先手棋”。数据投入号称是其他厂商的3-4倍，熟悉的“数海战术”又来了。问题是，世界模型的核心是理解物理世界的动态逻辑，不是靠喂更多视频素材就能堆出来的。这种“大力出奇迹”的路径依赖，恰恰暴露了在底层认知框架上的迷茫。

更有趣的是Coding业务的尴尬。投入“仅次于世界模型”，却始终声量低迷，内部产品都不愿用自家的Seed-Code。原因赤裸裸：模型能力不行，导致业务方外采DeepSeek或Claude，于是真实反馈数据无法回流，模型就更难改进——一个完美的死亡螺旋。直到2026年强制应用部门使用Seed模型，才勉强形成闭环。这种靠行政命令推行技术产品的戏码，在技术驱动公司里显得颇为讽刺。Coding本应是Agent能力的基石，是打磨模型逻辑推理的磨刀石，如今却沦为内部政治经济学的试验田。当你的“Dogfooding”需要靠强制才能实现时，说明你的产品力还没过硬到让人自愿选择。

至于坐拥SOTA地位的Seedance，其胜利被总结为“数据的胜利”——2000人的评测团队、海量训练数据，这依然是字节最擅长的规模化打法。但视频生成的下一站是“动态生成”，是理解运动物理与长期一致性，这已不是单纯堆数据能解决的。在生成式AI从“感知”迈向“行动”的临界点，字节似乎仍更信赖熟悉的规模效应，而非对认知架构的颠覆性重构。

总体来看，字节2026年的AI战略，像一场精心计算的补考：用最充裕的资金和最密集的人才，试图快速填平曾经因战略摇摆留下的沟壑。世界模型的“赌”，是承认未来赛道的不确定性后，选择用资源覆盖风险；Coding的强制回流，是承认内部协同失效后的行政纠偏；视频模型的维持，则是对过往路径的路径依赖。字节当然有可能靠钱和人砸出结果，但真正的挑战在于：当AI竞争从“比谁跑得快”转向“比谁想得深”时，习惯用迭代速度和工程规模取胜的字节，是否还能找到那条无法被资源碾压的、真正的技术窄门？追赶SOTA固然重要，但若只是追着对手的影子跑，即便跑到了同一条线，也可能早已偏离了真正的终局。

Disclaimer: The above content is generated by AI and is for reference only.

Video Generation Agent Programming

Read Original →

Analysis 深度分析

Share to WeChat 分享到微信

Related Articles 相关文章