A Tsinghua University Gen-Z Team Raises Tens of Millions in Two Funding Rounds to Address Token Billing Anxiety | Smart Emergence Exclusive

When a team of post-00s PhDs claims they can run an 80B-parameter large model on just 4GB of memory, the tech community’s first reaction is usually, “Another PPT startup.” But the tens of millions in funding WanGe ZhiYuan recently secured at least suggests the capital market’s appetite hasn’t been ruined by the AI bubble just yet. Founded by Tsinghua PhD Wang Guanbo, this company is betting on a path completely opposite to mainstream cloud services: running large models on your local device—and

Hot

Quality

Impact

Analysis 深度分析

Their core argument is sharp: cloud-based inference is a dead end. No matter how much token costs drop, they can’t keep up with the rate at which AI agent tools consume computing power. Tools like Claude Code and OpenClaw have caused an explosion in token demand, while chip manufacturers are struggling with devices limited to 32GB of memory—nearly the physical limit for edge hardware. WanGe ZhiYuan’s cPilot engine claims to squeeze enough memory space to run an 80B model where only a 4B model would fit, at 12 times the speed of other solutions.

This data deserves close scrutiny. If true, it means a standard AI PC’s inference capability could rival that of a small server, saving about 2,000 yuan in hardware costs per machine. But the word “if” is critical. The industry remains cautiously skeptical of such claims: dynamic sparse activation algorithms sound great, but how can they guarantee the model doesn’t lose accuracy under such extreme compression? Could those “predicted” skipped parameters turn out to be the key links in answering complex questions? What the tech team needs to prove isn’t peak performance, but stable and reliable inference quality in daily use.

What’s even more interesting is their strategic choice: they’re not building small models or focusing on post-training. This contrarian approach represents a kind of hedge against industry trends. While everyone else competes on cloud pricing, they’re betting that the demand for localized deployment will suddenly surge. Last year they were adapting for chip manufacturers; this year the explosion of OpenClaw gave them a glimpse of consumer-facing opportunities. This kind of agile pivot might be a strength of a young team—no historical baggage, ready to change course at any moment.

But here’s the question: edge intelligence has been hyped for years—why hasn’t it truly taken off? Wang Guanbo attributes it to “overlooked memory consumption,” but this diagnosis might be oversimplified. Is the reluctance to run models locally solely due to insufficient memory? Aren’t privacy concerns, hassle of updates and maintenance, and the lack of killer apps even more critical factors? Their Amis platform aims to act as an “orchestration hub,” automatically allocating cloud and local computing power. This is a clever idea, but ecosystem building has never been something technology can solve alone.

Perhaps the most intriguing aspect is their customer strategy: first capture orders from B-end chip manufacturers, then penetrate the C-end through hardware pre-installation. This “B to C” path seems steady but is fraught with challenges. Chip manufacturers are willing to pay for your software only if your solution helps them sell more hardware. But if consumer demand doesn’t explode as expected, will these collaborations turn into one-time transactions? Pre-installing on tens of thousands of devices sounds impressive, but against the backdrop of a trillion-yuan AI market, it feels more like testing the waters than making a breakthrough.

The drive of a young team is commendable, but entrepreneurship isn’t a lab-based algorithm competition. When Wang Guanbo says, “MoE has ten times more sparsity to reduce,” he might forget that in the business world, a 10x improvement in performance isn’t as valuable as a 2x improvement in user experience. Edge intelligence is indeed the future, but many “technically correct” pioneers have fallen along the way. What WanGe ZhiYuan needs to prove isn’t just how large a model their engine can run, but why ordinary users should tolerate all the inconveniences of local deployment just to save a bit on token costs.

In this contest between cloud and edge, the most ironic takeaway might be: while everyone chases smarter models, the real deciding factor could be who can make users forget the model exists altogether.

当一家00后博士团队宣称能在4GB内存里跑起80B参数的大模型时，科技圈的第一反应通常是“又一个PPT公司”。但万格智元最近拿到的数千万融资，至少说明资本市场的胃口还没被AI泡沫撑坏。这家由清华博士王冠博创立的公司，押注了一个与主流云服务截然相反的赛道：让大模型在你的本地设备上跑起来，而且跑得飞快。

他们的核心论点很锋利：云端推理是条死胡同。Token成本再怎么降，也追不上Agent工具吞噬算力的速度。Claude Code、OpenClaw这些工具让Token需求爆炸式增长，而芯片厂商却在为32GB内存的设备发愁——这几乎是端侧硬件的物理极限。万格智元的cPilot引擎宣称能在这个紧巴巴的空间里，把原本只能塞下4B模型的内存，硬生生压榨出跑80B模型的空间，速度还比其他方案快12倍。

这个数据值得掰开揉碎了看。如果属实，这意味着一台普通AI PC的推理能力堪比小型服务器，每台机器还能省下约2000元硬件成本。但“如果”二字是关键。业内对这类宣称普遍持谨慎态度：动态稀疏化激活算法听起来很美，但如何保证模型在如此极端的压缩下不丢精度？那些被“预判”跳过的参数，会不会恰好是回答复杂问题时的关键链条？技术团队需要证明的不是峰值性能，而是日常使用中稳定可靠的推理质量。

更有趣的是他们的战略选择：不做小模型，也不做后训练。这反其道而行之的做法背后，是对行业趋势的某种对冲。当所有人都在云端卷价格时，他们赌的是本地化部署的刚需会突然爆发。去年他们还在为芯片厂商做适配，今年OpenClaw的爆火让他们看到了To C的曙光。这种敏捷的转身能力，或许是年轻团队的优势——没有历史包袱，随时可以调转船头。

但问题在于，端侧智能喊了这么多年，为什么始终火不起来？王冠博将原因归结为“内存消耗被忽视”，这个诊断可能过于简化。用户不愿在本地跑模型，难道仅仅因为内存不够？隐私顾虑、更新维护麻烦、缺乏杀手级应用，这些因素难道不更关键吗？他们的Amis平台试图扮演“调度中枢”的角色，自动分配云端和本地算力，这个想法很巧妙，但生态建设从来不是技术能单方面解决的。

最值得玩味的是他们的客户策略：先吃B端芯片厂商的订单，再用硬件预装渗透C端。这套“B to C”的路径看似稳妥，实则布满荆棘。芯片厂商愿意为你的软件买单，是因为你的方案能帮他们卖更多硬件；但如果C端需求没有如期爆发，这些合作会不会沦为一次性交易？数万台设备预装出货听起来不错，但面对万亿级的AI市场，这更像是试水而非破局。

年轻团队的冲劲值得欣赏，但创业不是实验室里的算法竞赛。当王冠博说“MoE还有十倍稀疏度可下降”时，他可能忘了，在商业世界里，性能提升十倍不如用户体验提升一倍来得实在。端侧智能确实是未来，但通往未来的路上倒下过太多“技术正确”的先驱。万格智元需要证明的，不是他们的引擎能跑多大模型，而是普通用户为何要忍受本地部署的种种不便，只为省下那点Token成本。

在这场云端与端侧的较量中，最讽刺的或许是：当所有人都在追逐更聪明的模型时，真正决定胜负的可能是谁能让用户忘记模型的存在。

Disclaimer: The above content is generated by AI and is for reference only.

融资推理部署

Read Original →

Analysis 深度分析

Related Articles 相关文章