We gave two frontier models the exact same five coding prompts and recorded what each one built. No edits, no retries, no cherry picking. Fugu Ultra from Sakana AI and GLM 5.2 from Z.ai each wrote a self-playing Asteroids, a self-playing Pong, a plasma field, a wormhole tunnel, and a hyperspace starfield, every one a single self-contained HTML file with no libraries. Both models run on EmpirioLabs behind one OpenAI compatible API, so this was one request body with the model name swapped.
Watch all five tests
How we ran it
Each prompt went to each model as one user message, one shot, and we rendered exactly what came back with no edits. Reasoning effort was set to max for both. Fugu Ultra runs its thinking always on, and GLM 5.2 ran at its highest reasoning effort. No temperature override and no system prompt. Maximum output was 32000 tokens. Every prompt asked for a single self-contained HTML file with all CSS and JavaScript inline, no external libraries, no CDN links, and no imports.
The results
Both models returned working code on all five prompts on the first try. Here is the size of each answer, measured in lines of the final HTML file.
| Test | Fugu Ultra | GLM 5.2 |
|---|---|---|
| Self-playing Asteroids | 948 lines | 656 lines |
| Self-playing Pong | 486 lines | 412 lines |
| Plasma field | 298 lines | 131 lines |
| Wormhole tunnel | 255 lines | 199 lines |
| Hyperspace starfield | 241 lines | 166 lines |
What we noticed
The two models work very differently under the hood, and the test shows it. Fugu Ultra is a multi-agent orchestration model: it runs several internal reasoning passes before it answers, so it spent far longer per task and produced much more reasoning along the way. It also wrote more lines of code on every prompt. GLM 5.2 is a fast single-pass model with a 1M token context window, and it returned tighter files in a fraction of the time. Neither approach is the winner here. They are built for different jobs, and the right pick depends on whether you want maximum depth per request or speed and volume.
We are not naming a winner on purpose. Watch the clip, see how each render looks and behaves, and judge for your own use case.
Run the same test yourself
Both models serve the OpenAI compatible Chat Completions API, so switching between them is a one line change. Point base_url at https://api.empiriolabs.ai/v1 and set the model id to fugu-ultra or glm-5-2.
curl https://api.empiriolabs.ai/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fugu-ultra",
"messages": [{"role": "user", "content": "Build a self-playing Asteroids game as a single HTML file, no libraries."}]
}'
Change "model": "fugu-ultra" to "model": "glm-5-2" and run it again. That is the whole point of EmpirioLabs: every frontier model behind one API, so you can compare them on your own prompts without rewiring anything. You can also run both side by side in the playground.
Frequently asked questions
Which models were tested?
Fugu Ultra from Sakana AI and GLM 5.2 from Z.ai, both available on EmpirioLabs through one OpenAI compatible API.
What were the five coding tasks?
A self-playing Asteroids game, a self-playing Pong game, a demoscene plasma effect, an infinite wormhole tunnel, and a hyperspace starfield warp. Each had to be a single self-contained HTML file with no external libraries.
Was anything edited or retried?
No. Each model got one shot per prompt and we rendered exactly what it returned. We kept the result whether it looked great or not.
Why does Fugu Ultra take longer?
Fugu Ultra is a multi-agent orchestration model with always-on reasoning. It runs multiple internal passes before answering, which trades speed for depth. GLM 5.2 answers in a single pass.
How do I switch between the two models?
Change one string. Both serve the OpenAI Chat Completions API at https://api.empiriolabs.ai/v1, so you set the model id to fugu-ultra or glm-5-2 and everything else stays the same.
Try it
Open the playground | Fugu Ultra model page | GLM 5.2 model page | Pricing



