Variants
| Model | Active / Total | HF ID | Recipe |
|---|---|---|---|
| Kimi-K2.6 | 32 B / 1 T | moonshotai/Kimi-K2.6 | kimi-k2.5 |
| Kimi-K2.5 | 32 B / 1 T | moonshotai/Kimi-K2.5 | kimi-k2.5 |
| Kimi-K2-Instruct | 32 B / 1 T | moonshotai/Kimi-K2-Instruct | kimi-k2 |
| Kimi-K2-Thinking | 32 B / 1 T | moonshotai/Kimi-K2-Thinking | kimi-k2 |
| Moonlight-16B-A3B | 3 B / 16 B | moonshotai/Moonlight-16B-A3B | moonlight |
Fastest path to train
Moonlight on a single 8× H100 node — the smallest Moonshot recipe and a good MoE smoke test:model_type patch that lets Miles treat K2 as a DeepSeek-V3-shaped architecture).

