Future of Wisdom CEO Mei Tao: The gross margin of multimodal model tokens is far higher than that of language models.
Zhiwei Future is positioning itself as a native multimodal AI company pursuing world models, betting on architectural innovation and synthetic data to overcome training data scarcity. The company is commercializing via a B2B MaaS platform focused on video generation for marketing, film, and social media, securing significant funding while competing against larger rivals.
Deep Analysis
Defining the Pursuit: Native Multimodality as a Stepping Stone
While many companies claim to build world models, Zhiwei Future carefully distinguishes its current focus. CEO Mei Tao explicitly states, "We would not declare ourselves a world model company today." Instead, the company defines itself as a "native multimodal large model company," viewing this as a necessary and practical step on the path toward a true world model. This strategic framing sets a more achievable near-term goal while keeping the ambitious long-term vision intact.
A Contrarian Technical and Data Strategy
Faced with the industry's data bottleneck for world models, Zhiwei Future chooses a different path from the dominant approaches. Their strategy centers on algorithmic and architectural innovation to leverage low-cost synthetic data, rather than competing solely on data volume and compute. The core of this is their "Original Multimodal Unified Transformer (UiT)" architecture, designed for "Any to Any" capabilities, which they believe is intrinsic to a world model's function.
This approach directly addresses the immense cost and scarcity of real-world multimodal data. They start with limited high-quality, proprietary video data and then use their video models to generate thousands of synthetic variants—varying scenes, demographics—to train embodied intelligence models like VLA and WAM. This creates a data flywheel where their core model capability is used to bootstrap the more specialized data needed for downstream applications.
Business Model: The MaaS Platform and Commercial Timeliness
Zhiwei Future is actively transitioning from a pure model developer to an application platform provider. Their commercial framework is a "1+1+3" MaaS (Model as a Service) stack: a foundation of HiDream models, a middle-layer HiHarness enterprise service platform, and three application scenarios—commercial marketing, film creation, and social media creation.
Investor Wang Bing highlights a key financial insight: "The gross margin for multimodal model tokens is far higher than that for large language model tokens." This underpins the commercial viability of their B2B focus. Both the company and investors believe the video generation sector is at an inflection point where quality meets commercial standards, and rapidly falling compute costs will soon make previously unprofitable projects viable. Their strategy is to use cost-efficient model development to quickly capture these emerging B2B verticals against larger competitors.
Disclaimer: The above content is generated by AI and is for reference only.