Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism

Published in CPAL, 2025

Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar.

Read the paper