publications

* denotes equal contribution

2024

  1. NeurIPS
    Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
    Atli Kosson, Bettina Messmer, and Martin Jaggi
    In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
  2. ICML
    Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
    Atli Kosson*, Bettina Messmer*, and Martin Jaggi
    In ICML, 2024
  3. NeurIPS
    Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
    Alexander Hagele, Elie Bakouch, Atli Kosson, Loubna Ben allal, Leandro Von Werra, and 1 more author
    In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
  4. AAAI
    Ghost Noise for Regularizing Deep Neural Networks
    Atli Kosson, Dongyang Fan, and Martin Jaggi
    Proceedings of the AAAI Conference on Artificial Intelligence, 2024

2024

2023

  1. NeurIPS
    Multiplication-Free Transformer Training via Piecewise Affine Operations
    Atli Kosson, and Martin Jaggi
    In Thirty-seventh Conference on Neural Information Processing Systems, 2023

2023

2021

  1. MLSys
    Pipelined Backpropagation at Scale: Training Large Models without Batches
    Atli Kosson*, Vitaliy Chiley*, Abhinav Venigalla, Joel Hestness, and Urs Köster
    In Proceedings of Machine Learning and Systems, 2021

2021

2020

  1. Workshop
    Adaptive Braking for Mitigating Gradient Delay
    Abhinav Venigalla*, Atli Kosson*, Vitaliy Chiley, and Urs Köster
    In ICML 2020 Workshop on Beyond first order methods in machine learning systems, 2020

2020

2019

  1. NeurIPS
    Online Normalization for Training Neural Networks
    Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Köster, Ryan Reece, and 3 more authors
    Advances in Neural Information Processing Systems, 2019

2019