ScalingOpt Optimization at Scale
Discover, compare, and contribute to cutting-edge optimization algorithms designed for large-scale deep learning.
ScalingOPT
A research-oriented PyTorch codebase for optimizer-centric scaling studies in large language model training. Designed to make optimizer comparisons reproducible, fair, and ergonomically extensible.
30+ Optimizers
Single entrypoint
17 Model Configs
9M to 13B params
3 Data Pipelines
C4, Pile, OpenWebText
Multi-GPU DDP
Native torchrun
Architectures
Optimizer Categories
Training Modes
Pretrain
SFT / DPO
Evaluation
Latest News
Recent updates from the ScalingOpt community
Mano: Restriking Manifold Optimization for LLM Training
A novel manifold optimization method that projects momentum onto tangent space and constrains it on rotational Oblique manifold. Outperforms AdamW and Muon with lower memory consumption and computational complexity.
Explore OptimizerJianlin Su's Blog Collection
Added many profound articles from Jianlin Su (Scientific Spaces) covering optimization theory, Muon, and scaling laws.
Explore BlogsOptimizer Summary Sheet
A systematic academic summary of optimization in deep learning, covering foundational theories to modern adaptive and higher-order methods.
Open ReaderMembers
Meet the members behind ScalingOpt. We thank them for their contributions.
Team member information is continuously updated. We welcome email applications for collaboration.
Featured Optimizers
Discover the most powerful and innovative optimization algorithms powering modern AI
SSO
2026Fully μP-aligned optimization via spectral sphere steepest descent
Conda
2025Column-Normalized Adam for Training LLMs Faster
Muon
2024Orthogonal weight updates via Newton-Schulz iteration
SOAP
2024Improving and Stabilizing Shampoo using Adam
Industry-Optimized Implementations
Production-ready libraries with improved distributed support and hardware optimization
Hugging Face
Optimizers integrated into Transformers (AdamW, AdaFactor) with native support for distributed training and mixed precision.
Meta Research
Cutting-edge optimization algorithms like Distributed Shampoo developed by Meta for large-scale model training.
NVIDIA TensorRT
Advanced model optimization toolkit for NVIDIA GPUs, focusing on quantization and inference acceleration.
Deep Learning Optimization
Knowledge Summary
Our latest academic synthesis covers the entire spectrum of optimization in deep learning. From foundational gradient descent to the frontiers of second-order adaptive methods.
Why Choose ScalingOpt?
Everything you need to understand, implement, and scale optimization algorithms for modern AI
Extensive Optimizer Library
Explore all optimization algorithms from foundational SGD to cutting-edge Adam-mini and Muon, with detailed implementations and PyTorch code.
Research & Learning Hub
Access research papers, tutorials, and educational content covering optimization theory, implementation guides, and latest developments.
Open Source & Community
Contribute to open-source implementations, join GitHub discussions, and collaborate with researchers worldwide on optimization algorithms.
Join the Optimization Community
Connect with researchers and practitioners exploring efficient AI and optimization algorithms. Discover, learn, and contribute to the future of machine learning optimization.