Why do AI researchers care about training dynamics and not just the final model?