GLM52.pro

GLM 5.2 Benchmark Results

Last updated: June 2026 · Sources: HuggingFace Open LLM Leaderboard, Artificial Analysis, LiveCodeBench

Scores are aggregated from public leaderboards. Independent results may vary.

Coding Benchmarks

HumanEval (pass@1), LiveCodeBench, SWE-bench Verified

ModelHumanEvalLiveCodeBenchSWE-bench
GLM 5.2this model92.1%68.4%51.2%
Claude Fable94.3%71.2%55.1%
Kimi 2.790.8%66.9%49.7%
GPT-4o90.2%63.4%48.9%
Qwen 2.5 Coder88.5%61.2%44.3%

General Intelligence Benchmarks

MMLU, MATH-500, GPQA Diamond

ModelMMLUMATH-500GPQA
GLM 5.2this model88.4%82.1%65.3%
Claude Fable91.2%85.6%69.7%
Kimi 2.787.9%80.3%63.1%
GPT-4o88.7%76.6%53.6%
Qwen 2.5 Coder84.1%75.9%57.2%

Speed (API)

Output tokens/sec and Time to First Token via OpenRouter — June 2026

ModelTokens/secTTFT
GLM 5.2this model~85~0.6s
Claude Fable~70~0.8s
Kimi 2.7~90~0.5s
GPT-4o~65~0.9s

Verdict

GLM 5.2 is a top-3 coding model as of June 2026. It trails Claude Fable slightly on SWE-bench but beats it on price per token. For pure coding tasks — especially multi-file projects using the Coding Plan feature — it is highly competitive. If budget is a priority, GLM 5.2 offers the best value among the top-tier coding models.