Advanced Features

This section covers the Miles features that the Core-features section of the homepage points at: low-precision training (FP8 / MXFP8 / INT4 QAT), Rollout Routing Replay for MoE, speculative decoding, and LoRA training and serving.

Low Precision RL

The unified FP8 path: matched quantization between training and inference, BF16 backward and master weights.

INT4 QAT

W4A16 quantization-aware training for fitting large models on a single 8-GPU node.

Rollout Routing Replay (R3)

Capture expert routing during inference and replay during training. The mechanism that keeps MoE RL stable.

Speculative Decoding

Draft + target speculative rollout, with online MTP-SFT for the draft.

On-Policy Distillation

Train a student on its own rollouts while matching teacher token probabilities through SGLang or Megatron teacher modes.

LoRA Training and Serving

Train LoRA adapters with SFT or RL and serve them through SGLang from the same checkpoint.

Low Precision RL