2024 NeurIPS Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training Atli Kosson, Bettina Messmer, and Martin Jaggi In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 arXiv link ICML Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson*, Bettina Messmer*, and Martin Jaggi In ICML, 2024 arXiv link NeurIPS Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Alexander Hagele, Elie Bakouch, Atli Kosson, Loubna Ben allal, Leandro Von Werra, and 1 more author In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 arXiv link AAAI Ghost Noise for Regularizing Deep Neural Networks Atli Kosson, Dongyang Fan, and Martin Jaggi Proceedings of the AAAI Conference on Artificial Intelligence, 2024 arXiv 2024 2023 NeurIPS Multiplication-Free Transformer Training via Piecewise Affine Operations Atli Kosson, and Martin Jaggi In Thirty-seventh Conference on Neural Information Processing Systems, 2023 arXiv link 2023 2021 MLSys Pipelined Backpropagation at Scale: Training Large Models without Batches Atli Kosson*, Vitaliy Chiley*, Abhinav Venigalla, Joel Hestness, and Urs Köster In Proceedings of Machine Learning and Systems, 2021 arXiv link 2021 2020 Workshop Adaptive Braking for Mitigating Gradient Delay Abhinav Venigalla*, Atli Kosson*, Vitaliy Chiley, and Urs Köster In ICML 2020 Workshop on Beyond first order methods in machine learning systems, 2020 arXiv link 2020 2019 NeurIPS Online Normalization for Training Neural Networks Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Köster, Ryan Reece, and 3 more authors Advances in Neural Information Processing Systems, 2019 arXiv link 2019