What gets logged by default
Each rollout iteration emits a structured row to stdout (illustrative shape — exact fields depend on backend and config):--use-wandb is set, metrics also go to wandb under the train/, rollout/,
and perf/ namespaces (see miles/utils/wandb_utils.py).
Enabling wandb
--use-wandb, --wandb-project, --wandb-group. WANDB_API_KEY
should be supplied via Ray’s env_vars rather than baked into the launch script.
What to watch
| Signal | Healthy pattern | Red flag |
|---|---|---|
loss | Slow decay over hundreds of iterations | Spike → crash within an iteration |
raw_reward | Trending up, with healthy variance | Saturates near a single value (collapse) |
kl_loss | Bounded, drifts up over time | Sudden jump (policy diverged from ref) — only logged when --use-kl-loss |
train_rollout_logprob_abs_diff | Stable and small (≪ 1.0) | Climbing without bound → train/inference precision drift |
entropy_loss | Slowly decreasing | Falls to ~0 too fast (mode collapse) |
grad_norm | < clip_grad (1.0 by default) | Repeatedly hitting clip threshold |
rollout_time / train_time | Roughly balanced | One ≫ other → resource imbalance |
train/pg_clipfrac | < 0.2 | > 0.5 means policy is moving fast → drop LR |
loss.py and the rollout logger emit; Miles’s wandb metrics
live under train/, rollout/, perf/, multi_turn/, passrate/ namespaces.
Custom loggers
Replace the default rollout logger with your own to push to internal systems:Profiling
| Tool | When |
|---|---|
nvidia-smi dmon -s u | Quick sanity check on GPU utilization |
nsys profile | Deep CUDA-level profiling |
py-spy dump --pid <ray worker> | Find Python-side stalls |
ray timeline | Inspect Ray task scheduling |
Built-in PyTorch profiler
The PyTorch profiler is wired into Miles viamiles/utils/profile_utils.py. Flags
differ by backend:
Megatron — choose which sub-loop to profile:
chrome://tracing or Perfetto.
Where the log files live
| Source | Path |
|---|---|
| Trainer stdout | wherever you redirected ray job submit (or Ray dashboard) |
| Ray workers | ~/.ray/session_latest/logs/ |
| wandb local cache | wandb/run-<id>/files/ |
| FSDP profiler / memory snapshot | --tensorboard-dir, --memory-snapshot-path |
Router endpoints
The router exposes a small FastAPI surface used internally by Miles:| Endpoint | Method | What |
|---|---|---|
/add_worker | POST | Register an SGLang engine |
/list_workers | GET | List registered workers |
| any other path | GET / POST / PUT / DELETE | Proxied to a selected SGLang worker — e.g. /generate, /v1/chat/completions, /health. |

