Low Precision RL
The unified FP8 path: matched quantization between training and inference,
BF16 backward and master weights.
INT4 QAT
W4A16 quantization-aware training for fitting large models on a single
8-GPU node.
Rollout Routing Replay (R3)
Capture expert routing during inference and replay during training. The
mechanism that keeps MoE RL stable.
Speculative Decoding
Draft + target speculative rollout, with online MTP-SFT for the draft.
LoRA Training and Serving
Train LoRA adapters with SFT or RL and serve them through SGLang from the
same checkpoint.

