The processes
A Miles run is three kinds of processes wrapped in a Ray cluster:- Trainer ranks — Megatron processes that load
torch_distcheckpoints and run the RL loop. - SGLang servers — independent HTTP services that produce rollouts.
- Miles Router — FastAPI proxy that distributes rollout requests, preserves metadata (R3), and enforces health checks.
- Data Source — Python object owned by the trainer; reads prompt JSONL and acts as a buffer between rollout and training.
The package layout
train.py and train_async.py are the two entry points. They’re thin: ~200 lines
each. Most logic lives in the modules above.
A request’s life
For a single GRPO iteration: This is the sync path. Async (train_async.py + --rollout-function-path fully_async_rollout.generate_rollout_fully_async) breaks the request from the trainer
loop and uses a continuously-running worker.
Where common changes go
| You want to … | Edit |
|---|---|
| Add a new RL algorithm | miles/backends/training_utils/loss.py + enum in miles/utils/arguments.py |
| Add a new built-in reward type | miles/rollout/sglang_rollout.py (rm dispatch) |
| Add a new built-in filter | miles/rollout/filter_hub/ |
| Wrap a new model architecture | miles_plugins/models/<model>.py + mbridge |
| Add a new flag | miles/utils/arguments.py |
| Change weight sync | miles/backends/megatron_utils/update_weight/ and miles/utils/distributed_utils.py |
| Change rollout buffer | miles/rollout/data_source.py |
Extension points (the right way)
The trainer is plug-in-friendly. Most extensions don’t need a code change inside Miles — just pass a--something-path my_pkg.thing. See Customization
for the full list.
If you find yourself patching the trainer to make something work, that’s a sign we’re
missing a hook. Open an issue.
Tests
tests/fast/ folder may hold only CPU CI: every test_*.py
there auto-registers as stage-a-cpu, so no boilerplate is needed — write a literal register_cpu_ci(...)
only to override the defaults, and a register_cuda_ci under tests/fast/ is an error (move the file
to tests/fast-gpu/). Every other folder may hold GPU or CPU CI and must register each test
explicitly with register_cpu_ci / register_cuda_ci. The runner collects tests/fast/,
tests/fast-gpu/, tests/e2e/, and tests/ci/.
Run pytest tests/fast for a quick CPU check (pytest tests/fast-gpu if you have a GPU);
run tests/e2e before landing anything that touches the train loop.
Where to look first when reading the code
If you have 30 minutes and want to understand Miles end-to-end:train.py— the loop, top-to-bottom.miles/rollout/sglang_rollout.py:generate_rollout— how prompts become samples.miles/backends/training_utils/loss.py— the loss and advantage computation.miles/router/router.py— the FastAPI proxy.miles/utils/distributed_utils.py— weight sync.

