# Miles ## Docs - [Backends Beyond Megatron](https://miles.radixark.com/docs/advanced/architecture-support.md): Embed HuggingFace implementations as black-box modules inside Megatron's parallel pipeline. - [Fault Tolerance](https://miles.radixark.com/docs/advanced/fault-tolerance.md): Rollout-side health checks and engine recovery, gated by --use-fault-tolerance. - [Low Precision RL](https://miles.radixark.com/docs/advanced/fp8-low-precision.md): Unified low-precision pipelines for RL — block-wise FP8, MXFP8, and NVFP4 across rollout and training. - [Advanced Features](https://miles.radixark.com/docs/advanced/index.md): Systems-level features for large-scale and long-running RL. - [INT4 Quantization-Aware Training](https://miles.radixark.com/docs/advanced/int4-qat.md): Fit large models on a single 8-GPU node by training with W4A16 quantization in the loop. - [LoRA Training and Serving](https://miles.radixark.com/docs/advanced/lora.md): Train LoRA adapters with miles SFT or RL recipes and serve them through SGLang from the same checkpoint. - [Rollout Routing Replay (R3)](https://miles.radixark.com/docs/advanced/miles-router.md): Capture expert routing during inference and replay it during training to stabilize RL. - [P2P Weight Transfer](https://miles.radixark.com/docs/advanced/p2p-weight-transfer.md): Direct rank-to-rank weight sync from actor to rollout via RDMA writes. - [PD Disaggregation](https://miles.radixark.com/docs/advanced/pd-disaggregation.md): Separate prefill and decode pools so each is sized for its workload. - [Speculative Decoding](https://miles.radixark.com/docs/advanced/speculative-decoding.md): Draft + target speculative rollout, with online SFT for MTP-style drafts. - [Blog](https://miles.radixark.com/docs/blog/index.md): Engineering posts and release notes from the Miles team. - [Architecture Overview](https://miles.radixark.com/docs/developer/architecture.md): The 30-minute tour of how Miles is organized internally. - [Contributing](https://miles.radixark.com/docs/developer/contributor-guide.md): PR conventions, code layout, and how reviews work. - [Debugging](https://miles.radixark.com/docs/developer/debug.md): Aligning precision, separate train/rollout debugging, common kernel pitfalls. - [Experimental Features](https://miles.radixark.com/docs/developer/experimental-features.md): Backends and features that exist in tree but are not production-ready — opt-in at your own risk. - [Developer Guide](https://miles.radixark.com/docs/developer/index.md): Architecture, contribution conventions, debugging, and migration notes. - [Migration Guide](https://miles.radixark.com/docs/developer/migration.md): Sync → async loop, breaking flag changes between releases. - [Fully Async Rollout](https://miles.radixark.com/docs/examples/fully-async.md): Keep generation running continuously in the background so the trainer never waits. - [Examples](https://miles.radixark.com/docs/examples/index.md): Annotated end-to-end walkthroughs for the workflows people actually want to build. - [Multi-Agent Co-Evolution](https://miles.radixark.com/docs/examples/multi-agent.md): Two specialized agents train together and improve each other. - [SFT on OpenHermes](https://miles.radixark.com/docs/examples/openhermes-sft.md): Plain supervised fine-tuning of Qwen3-4B-Base on the OpenHermes-2.5 dataset. - [Reproducibility Recipe](https://miles.radixark.com/docs/examples/reproducibility.md): Bit-stable training across reruns. Determinism flags, seeds, and what to watch. - [ReTool — Code Execution Tool Use](https://miles.radixark.com/docs/examples/retool.md): SFT + RL pipeline that teaches a model to interleave thinking with sandboxed Python execution. - [Search-R1 (Tool Use)](https://miles.radixark.com/docs/examples/search-r1.md): Train a model to issue search queries, integrate observations, and answer multi-turn QA. - [FAQ](https://miles.radixark.com/docs/faq.md): The questions every new Miles user asks in their first week. - [Getting Started](https://miles.radixark.com/docs/getting-started/index.md): Install Miles and run your first RL training job — the two pages you need to go from zero to a working loop. - [Installation](https://miles.radixark.com/docs/getting-started/installation.md): Install Miles on NVIDIA or AMD GPUs. Docker is the recommended path. - [Quick Start](https://miles.radixark.com/docs/getting-started/quick-start.md): A working RL training job on Qwen3-4B in under an hour. - [Miles Documentation](https://miles.radixark.com/docs/index.md) - [DeepSeek R1 / V3](https://miles.radixark.com/docs/models/deepseek/deepseek.md): Launch recipe for DeepSeek-R1 / DeepSeek-V3 (671 B total / 37 B active) on 16 nodes × 8 H100. - [DeepSeek-V4 Flash](https://miles.radixark.com/docs/models/deepseek/deepseek-v4-flash.md): Launch recipe for DeepSeek-V4-Flash (284 B) — FP8 rollout / BF16 train, 8-node H200 (64 GPUs). - [DeepSeek-V4 Pro](https://miles.radixark.com/docs/models/deepseek/deepseek-v4-pro.md): Launch recipe for DeepSeek-V4-Pro (1.6 T) — V4-family architecture at Pro scale. - [DeepSeek](https://miles.radixark.com/docs/models/deepseek/index.md): Miles recipes for the DeepSeek family — DeepSeek-V4 Flash (sparse-MLA + DSA indexer), V3, and R1. - [GLM4](https://miles.radixark.com/docs/models/glm/glm4.md): Launch recipes for GLM-Z1-9B-0414. The 32 B model config ships without a launcher. - [GLM4.5](https://miles.radixark.com/docs/models/glm/glm4-5.md): Launch recipes for GLM-4.5 (355B-A32B) — bash launcher and Python launcher. - [GLM4.7 Flash](https://miles.radixark.com/docs/models/glm/glm4-7-flash.md): Launch recipes for GLM-4.7-Flash — compact MLA + MoE with R3 enabled by default. - [GLM-5 / GLM-5.1](https://miles.radixark.com/docs/models/glm/glm5.md): Launch recipe for GLM-5 and GLM-5.1 (744 B / 40 B active) — Python launcher, 16+ node config. - [GLM](https://miles.radixark.com/docs/models/glm/index.md): Miles recipes for the GLM4, GLM4.5, GLM4.7 Flash, and GLM5 families — dense and MoE. - [GPT-OSS 20B](https://miles.radixark.com/docs/models/gpt-oss/gpt-oss.md): Two launchers — Megatron BF16 (8 GPU, mbridge) and FSDP (4 GPU, dequantizes MXFP4 → BF16 first). - [Supported Models](https://miles.radixark.com/docs/models/index.md): Per-family recipes covering weight conversion, launch flags, and parallelism choices. - [Kimi](https://miles.radixark.com/docs/models/kimi/index.md): Miles recipes for the Moonshot family — Kimi K2.6 / K2.5 (multimodal, 1 T / 32 B-A), Kimi K2 / K2-Thinking, and Moonlight 16B-A3B. - [Kimi K2](https://miles.radixark.com/docs/models/kimi/kimi-k2.md): Launch recipes for Kimi-K2-Instruct and Kimi-K2-Thinking — 32 nodes × 8 GPU. - [Kimi K2.5 / K2.6](https://miles.radixark.com/docs/models/kimi/kimi-k2.5.md): Launch recipe for Kimi-K2.5, running full-parameter GRPO on 32 × 8 H200 with an INT4 actor and a BF16 reference. - [Moonlight](https://miles.radixark.com/docs/models/kimi/moonlight.md): Single-node MoE recipe (8 GPU) — DAPO-style dynamic sampling and CPU Adam on by default. - [MiMo](https://miles.radixark.com/docs/models/mimo/mimo.md): Single-node GRPO + EAGLE speculative recipe with online MTP training. - [Nemotron](https://miles.radixark.com/docs/models/nemotron/index.md): Miles recipes for NVIDIA's Nemotron-3 family — Mamba+Attention(+MoE) hybrids loaded via Megatron AutoBridge. - [Nemotron-3-Nano](https://miles.radixark.com/docs/models/nemotron/nemotron-3-nano.md): Launch recipe for the dense NVIDIA Nemotron-3-Nano-4B (Mamba+Attention hybrid) via Megatron AutoBridge. - [Nemotron-3-Nano MoE](https://miles.radixark.com/docs/models/nemotron/nemotron-3-nano-moe.md): Launch recipe for NVIDIA Nemotron-3-Nano-30B-A3B (Mamba+Attention+MoE hybrid) via Megatron AutoBridge. - [Nemotron-3-Super](https://miles.radixark.com/docs/models/nemotron/nemotron-3-super.md): Launch recipe for NVIDIA Nemotron-3-Super-120B-A12B-FP8 (Mamba+Attention+MoE hybrid, FP8 native) via Megatron AutoBridge. - [Qwen](https://miles.radixark.com/docs/models/qwen/index.md): Miles recipes for the full Qwen3, Qwen3.5, and Qwen3-Next line — dense and MoE. - [Qwen3](https://miles.radixark.com/docs/models/qwen/qwen3.md): Launch recipes for dense Qwen3 models (0.6 B – 32 B). - [Qwen3.5](https://miles.radixark.com/docs/models/qwen/qwen3-5.md): Launch recipes for Qwen3.5-4B / 9B / 27B with attention-output-gate. - [Qwen3.5 MoE](https://miles.radixark.com/docs/models/qwen/qwen3-5-moe.md): Launch recipe for Qwen3.5-35B-A3B with MTP training and EAGLE speculative rollout. - [Qwen3.6](https://miles.radixark.com/docs/models/qwen/qwen3-6.md): Launch recipe for the dense Qwen3.6-27B with attention-output-gate. - [Qwen3.6 MoE](https://miles.radixark.com/docs/models/qwen/qwen3-6-moe.md): Launch recipe for Qwen3.6-35B-A3B with MTP training and EAGLE speculative rollout. - [Qwen3 MoE](https://miles.radixark.com/docs/models/qwen/qwen3-moe.md): Launch recipes for Qwen3-30B-A3B (single node) and Qwen3-235B-A22B (multi-node). - [Qwen3-Next 80B-A3B](https://miles.radixark.com/docs/models/qwen/qwen3-next.md): Launch recipes for Qwen3-Next-80B-A3B-Thinking on Megatron and FSDP backends. - [AMD MI300X](https://miles.radixark.com/docs/platforms/amd.md): ROCm 6.3+ with patches for virtual memory management. Same launch scripts. - [Platforms](https://miles.radixark.com/docs/platforms/index.md): Hardware-specific tutorials. Most users want NVIDIA H/B; AMD MI300X is supported via ROCm. - [NVIDIA H / B Series](https://miles.radixark.com/docs/platforms/nvidia.md): H100, H200, B100, B200 — Miles's primary target. - [Agentic Chat Templates (TITO)](https://miles.radixark.com/docs/user-guide/agentic-chat-template.md): How to turn on and verify Token-In-Token-Out (TITO) for multi-turn agentic rollout. - [Argument Groups](https://miles.radixark.com/docs/user-guide/argument-groups.md): The launch-script argument groups used by Miles recipes, with links to the flags that belong in each group. - [CLI Reference](https://miles.radixark.com/docs/user-guide/cli-reference.md): Every command-line flag Miles accepts, grouped by subsystem. - [Core Concepts](https://miles.radixark.com/docs/user-guide/concepts.md): The four objects that make up every Miles job, and how data flows between them. - [Customization](https://miles.radixark.com/docs/user-guide/customization.md): The 22 plug-points where you can drop in your own Python without forking Miles. - [Fully Async Rollout](https://miles.radixark.com/docs/user-guide/fully-async.md): How fully async rollout decouples generation from training, when to use it, and which flags enable it. - [User Guide](https://miles.radixark.com/docs/user-guide/index.md): Concepts, launch script walkthrough, customization hooks, and a complete CLI reference. - [Monitoring & Logging](https://miles.radixark.com/docs/user-guide/monitoring.md): wandb, structured logs, profiling, and what to look at when something looks off. - [Rollout Endpoints](https://miles.radixark.com/docs/user-guide/rollout-endpoints.md): How Miles talks to SGLang. The /generate endpoint and the OpenAI-format /v1/chat/completions endpoint. - [Training Script Walkthrough](https://miles.radixark.com/docs/user-guide/training-script-walkthrough.md): An annotated tour through every argument group in a Miles launch script, plus the feature modes you turn on when a recipe isn't enough. - [Training Backend](https://miles.radixark.com/docs/user-guide/usage.md): Megatron-LM as the training backend — parameters, parallelism, checkpoints, and hooks. ## OpenAPI Specs - [openapi](https://miles.radixark.com/docs/api-reference/openapi.json)