GLM 5.2 Benchmark Results

Last updated: June 2026 · Sources: HuggingFace Open LLM Leaderboard, Artificial Analysis, LiveCodeBench

Scores are aggregated from public leaderboards. Independent results may vary.

Coding Benchmarks

HumanEval (pass@1), LiveCodeBench, SWE-bench Verified

Model	HumanEval	LiveCodeBench	SWE-bench
GLM 5.2this model	92.1%	68.4%	51.2%
Claude Fable	94.3%	71.2%	55.1%
Kimi 2.7	90.8%	66.9%	49.7%
GPT-4o	90.2%	63.4%	48.9%
Qwen 2.5 Coder	88.5%	61.2%	44.3%

General Intelligence Benchmarks

MMLU, MATH-500, GPQA Diamond

Model	MMLU	MATH-500	GPQA
GLM 5.2this model	88.4%	82.1%	65.3%
Claude Fable	91.2%	85.6%	69.7%
Kimi 2.7	87.9%	80.3%	63.1%
GPT-4o	88.7%	76.6%	53.6%
Qwen 2.5 Coder	84.1%	75.9%	57.2%

Speed (API)

Output tokens/sec and Time to First Token via OpenRouter — June 2026

Model	Tokens/sec	TTFT
GLM 5.2this model	~85	~0.6s
Claude Fable	~70	~0.8s
Kimi 2.7	~90	~0.5s
GPT-4o	~65	~0.9s

Verdict

GLM 5.2 is a top-3 coding model as of June 2026. It trails Claude Fable slightly on SWE-bench but beats it on price per token. For pure coding tasks — especially multi-file projects using the Coding Plan feature — it is highly competitive. If budget is a priority, GLM 5.2 offers the best value among the top-tier coding models.