1. Model Introduction
GLM-4.5 is Zhipu AI’s flagship MoE language model with advanced capabilities in reasoning, function calling, and multi-modal understanding. Key highlights:- Sparse MoE architecture: 355 B / 32 B-active for frontier runs and 106 B / 12 B-active for two-node experimentation.
- Strong reasoning: built-in step-by-step reasoning, with FP8 rollout supported on Blackwell hardware.
- Speculative decoding: EAGLE/MTP rollout supported by the bash launcher; the Python launcher exposes
--enable-mtp. - R3 / MIS opt-in: routing-stability extensions available behind a flag (
--enable-mis) on the Python launcher.
2. Supported Variants
| Model | Active / Total | HF ID |
|---|---|---|
| GLM-4.5-355B-A32B | 32 B / 355 B | zai-org/GLM-4.5 |
| GLM-4.5-Air (106B-A12B) | 12 B / 106 B | zai-org/GLM-4.5-Air |
scripts/; the canonical recipe is examples/p2p_weight_transfer/GLM-4.5-Air.sh (8-node, P2P weight transfer).
3. Environment Setup
3.1 Required env vars
The bash launcher (run-glm4.5-355B-A32B.sh) requires:
run_glm45_355b_a32b.py) reads no env vars — pass options via the Typer CLI.
3.2 Download model + datasets
3.3 HF → Megatron torch_dist conversion
The bash launcher does not convert for you — produce $BASE_DIR/GLM-4.5-355B-A32B_torch_dist/ ahead of time:
tools/convert_hf_to_fp8.py → convert_checkpoint → rsync to model_local_dir → submit).
4. Launch
4.1 Quick start
4.2 Multi-node fan-out
run-glm4.5-355B-A32B.sh performs Ray fan-out internally via the ssh loop over /root/mpi_rack_hostfile.
5. Recipe Configuration
5.1 Parallelism
| Source | TP | PP | CP | EP | expert-TP | max_tokens_per_gpu | GPUs |
|---|---|---|---|---|---|---|---|
run-glm4.5-355B-A32B.sh | 8 | 4 | 2 | 16 | 1 | 16384 | 64 (8 × 8) |
run_glm45_355b_a32b.py (num_nodes ≤ 4, debug) | 4 | 1 | 1 | 4 | 1 | 16384 | ≤ 32 (≤ 4 × 8) |
run_glm45_355b_a32b.py (num_nodes == 8) | 4 | 8 | 2 | 8 | 1 | 16384 | 64 (8 × 8) |
5.2 Algorithm
| Source | Advantage | Notable flags |
|---|---|---|
run-glm4.5-355B-A32B.sh | GSPO | --eps-clip 1e-4 --eps-clip-high 2e-4 --use-tis |
run_glm45_355b_a32b.py | GRPO | --eps-clip 1e-4 --eps-clip-high 2e-4 --use-tis |
--use-rollout-routing-replay by default. The Python launcher exposes --enable-mis (TIS/RS config) as an opt-in.
5.3 Rollout & SGLang
--moe-token-dispatcher-type flex, --moe-enable-deepep.
5.4 Optimizer
CPU Adam on:5.5 Notable quirks
- The bash launcher does not set
--load/--saveinCKPT_ARGS—--loaddefaults to the value of--ref-load. run_glm45_355b_a32b.pyis Blackwell-only:_execute_trainassertsargs.hardware != "H100".
6. Pairs Well With
- Low Precision RL
- INT4 QAT
- Rollout Routing Replay (R3) — opt-in via
--enable-mison the Python launcher.

