A Tsinghua University Gen-Z Team Raises Tens of Millions in Two Funding Rounds to Address Token Billing Anxiety | Smart Emergence Exclusive
When a team of post-00s PhDs claims they can run an 80B-parameter large model on just 4GB of memory, the tech community’s first reaction is usually, “Another PPT startup.” But the tens of millions in funding WanGe ZhiYuan recently secured at least suggests the capital market’s appetite hasn’t been ruined by the AI bubble just yet. Founded by Tsinghua PhD Wang Guanbo, this company is betting on a path completely opposite to mainstream cloud services: running large models on your local device—and
Analysis
When a team of post-00s PhDs claims they can run an 80B-parameter large model on just 4GB of memory, the tech community’s first reaction is usually, “Another PPT startup.” But the tens of millions in funding WanGe ZhiYuan recently secured at least suggests the capital market’s appetite hasn’t been ruined by the AI bubble just yet. Founded by Tsinghua PhD Wang Guanbo, this company is betting on a path completely opposite to mainstream cloud services: running large models on your local device—and doing so blazingly fast.
Their core argument is sharp: cloud-based inference is a dead end. No matter how much token costs drop, they can’t keep up with the rate at which AI agent tools consume computing power. Tools like Claude Code and OpenClaw have caused an explosion in token demand, while chip manufacturers are struggling with devices limited to 32GB of memory—nearly the physical limit for edge hardware. WanGe ZhiYuan’s cPilot engine claims to squeeze enough memory space to run an 80B model where only a 4B model would fit, at 12 times the speed of other solutions.
This data deserves close scrutiny. If true, it means a standard AI PC’s inference capability could rival that of a small server, saving about 2,000 yuan in hardware costs per machine. But the word “if” is critical. The industry remains cautiously skeptical of such claims: dynamic sparse activation algorithms sound great, but how can they guarantee the model doesn’t lose accuracy under such extreme compression? Could those “predicted” skipped parameters turn out to be the key links in answering complex questions? What the tech team needs to prove isn’t peak performance, but stable and reliable inference quality in daily use.
What’s even more interesting is their strategic choice: they’re not building small models or focusing on post-training. This contrarian approach represents a kind of hedge against industry trends. While everyone else competes on cloud pricing, they’re betting that the demand for localized deployment will suddenly surge. Last year they were adapting for chip manufacturers; this year the explosion of OpenClaw gave them a glimpse of consumer-facing opportunities. This kind of agile pivot might be a strength of a young team—no historical baggage, ready to change course at any moment.
But here’s the question: edge intelligence has been hyped for years—why hasn’t it truly taken off? Wang Guanbo attributes it to “overlooked memory consumption,” but this diagnosis might be oversimplified. Is the reluctance to run models locally solely due to insufficient memory? Aren’t privacy concerns, hassle of updates and maintenance, and the lack of killer apps even more critical factors? Their Amis platform aims to act as an “orchestration hub,” automatically allocating cloud and local computing power. This is a clever idea, but ecosystem building has never been something technology can solve alone.
Perhaps the most intriguing aspect is their customer strategy: first capture orders from B-end chip manufacturers, then penetrate the C-end through hardware pre-installation. This “B to C” path seems steady but is fraught with challenges. Chip manufacturers are willing to pay for your software only if your solution helps them sell more hardware. But if consumer demand doesn’t explode as expected, will these collaborations turn into one-time transactions? Pre-installing on tens of thousands of devices sounds impressive, but against the backdrop of a trillion-yuan AI market, it feels more like testing the waters than making a breakthrough.
The drive of a young team is commendable, but entrepreneurship isn’t a lab-based algorithm competition. When Wang Guanbo says, “MoE has ten times more sparsity to reduce,” he might forget that in the business world, a 10x improvement in performance isn’t as valuable as a 2x improvement in user experience. Edge intelligence is indeed the future, but many “technically correct” pioneers have fallen along the way. What WanGe ZhiYuan needs to prove isn’t just how large a model their engine can run, but why ordinary users should tolerate all the inconveniences of local deployment just to save a bit on token costs.
In this contest between cloud and edge, the most ironic takeaway might be: while everyone chases smarter models, the real deciding factor could be who can make users forget the model exists altogether.
Disclaimer: The above content is generated by AI and is for reference only.