GLM 5.2 Benchmark Results
Last updated: June 2026 · Sources: HuggingFace Open LLM Leaderboard, Artificial Analysis, LiveCodeBench
Scores are aggregated from public leaderboards. Independent results may vary.
Coding Benchmarks
HumanEval (pass@1), LiveCodeBench, SWE-bench Verified
| Model | HumanEval | LiveCodeBench | SWE-bench |
|---|---|---|---|
| GLM 5.2this model | 92.1% | 68.4% | 51.2% |
| Claude Fable | 94.3% | 71.2% | 55.1% |
| Kimi 2.7 | 90.8% | 66.9% | 49.7% |
| GPT-4o | 90.2% | 63.4% | 48.9% |
| Qwen 2.5 Coder | 88.5% | 61.2% | 44.3% |
General Intelligence Benchmarks
MMLU, MATH-500, GPQA Diamond
| Model | MMLU | MATH-500 | GPQA |
|---|---|---|---|
| GLM 5.2this model | 88.4% | 82.1% | 65.3% |
| Claude Fable | 91.2% | 85.6% | 69.7% |
| Kimi 2.7 | 87.9% | 80.3% | 63.1% |
| GPT-4o | 88.7% | 76.6% | 53.6% |
| Qwen 2.5 Coder | 84.1% | 75.9% | 57.2% |
Speed (API)
Output tokens/sec and Time to First Token via OpenRouter — June 2026
| Model | Tokens/sec | TTFT |
|---|---|---|
| GLM 5.2this model | ~85 | ~0.6s |
| Claude Fable | ~70 | ~0.8s |
| Kimi 2.7 | ~90 | ~0.5s |
| GPT-4o | ~65 | ~0.9s |
Verdict
GLM 5.2 is a top-3 coding model as of June 2026. It trails Claude Fable slightly on SWE-bench but beats it on price per token. For pure coding tasks — especially multi-file projects using the Coding Plan feature — it is highly competitive. If budget is a priority, GLM 5.2 offers the best value among the top-tier coding models.