GLM-5 Soft-Launches on OpenRouter Under a Fake Name

A model called Pony Alpha appeared on OpenRouter: free, 200K context, Claude Opus 4.5 performance. Query the API about its origins and it claims to be GLM from Zhipu AI.

Newschinese-aiopen-sourcemodel-releasesopenrouter

A mysterious model called Pony Alpha showed up on OpenRouter last week. It's free. It handles 200K context. It performs at Claude Opus 4.5 levels. The provider field just says "Stealth."

The giveaway: query the API about its origins and it claims to be a GLM model developed by Zhipu AI.

The timing is about as subtle as a billboard. Hugging Face just merged GLM MoE DSA support into the transformers library. Zhipu publicly stated GLM-5 would launch around Chinese New Year, which falls this week. And "Pony"? That's the Year of the Horse in the Chinese zodiac. Someone at Zhipu is having fun with this.

The Model Behind the Mask

According to GLM5.net, GLM-5 runs 745 billion parameters in a mixture-of-experts architecture, with 44 billion active per inference. It borrows DeepSeek's sparse attention mechanism for long-context handling (the same architectural choice that made DeepSeek-V3 efficient at scale). Perhaps most notably, Zhipu trained the whole thing on Huawei Ascend chips using MindSpore. Full independence from US semiconductor hardware.

The model targets five capabilities: creative writing, coding, reasoning, agentic workflows, and long-context processing. The anticipated release is open-weights under MIT license.

So why not just... announce it?

OpenRouter's Pony Alpha has processed over 1.3 million requests in a single day. That's a stress test no internal QA team could replicate. By running anonymously on a Western platform, Zhipu gets real-world load testing, English-language user feedback, and international buzz without the PR risk of a botched official launch. All at once.

The "free" pricing starts to make sense when you see it this way. It's not charity; it's customer acquisition dressed up as research. OpenRouter explicitly notes that all prompts and completions are logged and may be used for model improvement.

Qwen's Qwen2.5 did something similar on Hugging Face last year, seeding developer enthusiasm weeks before the official announcement. Chinese labs can't rely on the same press cycle dynamics as US companies; they let models speak for themselves on platforms where developers actually work.

The mystery generates engagement that no press release could match. r/LocalLLaMA threads speculating about model identity. Twitter detectives analyzing output styles. Benchmark runners racing to evaluate performance. You can't buy this kind of attention.

The Hugging Face PR, authored by Cyrilvallez, adds GlmMoeDsa configuration and architecture support. It merged on February 9 after 20 commits and 26 passing checks. The timing strongly suggests the weights will drop any day now. For the open-source community, this matters: GLM-5 at 745B parameters would be one of the largest open-weights models available, and MoE architecture means local inference stays feasible with the right hardware. If it genuinely matches Claude Opus 4.5 on coding and agentic tasks, as early reports suggest, it fills a real gap in the open-source lineup.

Our read: Zhipu is betting that a week of anonymous hype, followed by "surprise, that was us," generates more attention than any announcement could. Given the ongoing debate about open versus closed models, having another strong open-weights contender at the frontier matters. The formal GLM-5 announcement should land within days. In the meantime, Pony Alpha is free to try, and Zhipu is watching.

Frequently Asked Questions