City-Level AI Services: From Pilot Programs to Routine Operations, Robots in Real-World Operations and Scaled Implementation | 2026 AI Partner · Beijing Yizhuang AI+ Industry Conference

The Core Challenge: Data as the Bottleneck for Embodied Intelligence

The article presents a pivotal argument: the primary barrier to advancing embodied intelligence is not algorithmic innovation but the lack of massive, real-world data. Unlike Large Language Models (LLMs) that train on abundant text data, training robots for complex physical tasks requires enormous volumes of interaction data from dynamic, real-world environments. The article draws a contrast with autonomous driving (Robotaxi), noting that while companies like Tesla leverage millions of vehicles for data collection, no analogous, mass-market “embodied intelligent terminal” exists for robotics. This creates a fundamental chicken-and-egg problem: without large-scale deployment, there’s no data; without data, the intelligence cannot evolve.

The Strategic Solution: “Fight to Train” and the World Model

KuaWhat’s strategic response is the “fight to train” model. This means moving beyond controlled testing grounds and deploying robots directly into real-world, revenue-generating services (like street cleaning and shuttle operations). Each robot becomes a mobile data collector as it navigates complex, unstructured environments. The operational data—on navigation, object interaction, and task execution—feeds back to refine and scale the AI models.

This approach is powered by a technological shift towards World Models. As explained in the speech by COO Li Kehong, 2023 marked a watershed moment. Newer World Models (like KuaWhat’s CooWAIM) are fundamentally different from previous modular robotics architectures. They are built on generative AI and are designed to simulate and predict physical world outcomes. By understanding environmental observations, they can forecast future actions and incorporate causal physics into decision-making. This creates a more robust and adaptable intelligence for robots.

Architecture and Application: The “One Brain, Multiple Forms” Paradigm

KuaWhat’s CooWAIM model uses a dual-system architecture:

The Intuitive Action System: Provides real-time, vision-based reasoning for immediate safety and efficiency (e.g., avoiding a sudden obstacle).
The Long-term Task Reasoning System: Handles global planning, semantic understanding, and complex task execution (e.g., planning a sanitation route).

These systems jointly enable two core capability domains:

Drive (Full-domain Mobility): Allowing robots to navigate diverse terrains, from main roads to cluttered pedestrian sidewalks and indoor spaces.
Work (Multi-joint Collaborative Operation): Integrating complex actuators like cleaning brushes and robotic arms, moving beyond simple pick-and-place to tightly coupled mobility and manipulation tasks.

This “one brain, multiple forms” architecture allows the same core AI to power different robot bodies—sanitation robots, autonomous buses, and delivery bots—across five key scenarios: sanitation, transportation, instant delivery, property management, and home services.

Economic Logic and the Path to Scale

A critical insight from the article is that technological capability must align with economic viability for successful scaling. KuaWhat emphasizes “economic rhythm”—deploying robots where the technology, product, and business models are mature enough to be profitable.

The progression is deliberately staged:

Start with open, high-volume scenarios (like sanitation): Achieve scale (e.g., 10,000 units) in the “Drive” domain first. Mastering unstructured mobility in complex city environments generates foundational data and proves economic value. Saving 20% operational time, for example, directly translates to ~20% higher gross margin.
Expand to scenarios combining mobility and simple manipulation (like instant delivery): This represents the next growth phase.
Eventually move to more controlled, complex environments (property management, homes): This is the long-term vision, requiring more advanced manipulation capabilities and even richer data.

Broader Implications and Conclusion

The article positions China as a uniquely advantageous market for this “fight to train” strategy due to its scale, urban complexity