ScalingOpt Optimization at Scale

Discover, compare, and contribute to cutting-edge optimization algorithms designed for large-scale deep learning.

Explore Optimizers Give us a Star on GitHub!
OUR CODEBASE

ScalingOPT

A research-oriented PyTorch codebase for optimizer-centric scaling studies in large language model training. Designed to make optimizer comparisons reproducible, fair, and ergonomically extensible.

30+ Optimizers

Single entrypoint

17 Model Configs

9M to 13B params

3 Data Pipelines

C4, Pile, OpenWebText

Multi-GPU DDP

Native torchrun

ScalingOPT
LLM

Architectures

LLaMA 9M–13B GPT-2 124M Qwen3 0.6B–1.7B

Optimizer Categories

AdamW Muon SOAP Shampoo APOLLO MARS Lion Sophia Adam-Mini LAMB GaLore +20 more

Training Modes

Pretrain

SFT / DPO

Evaluation

WHAT'S NEW

Latest News

Recent updates from the ScalingOpt community

New Optimizer Jan 2026

Mano: Restriking Manifold Optimization for LLM Training

A novel manifold optimization method that projects momentum onto tangent space and constrains it on rotational Oblique manifold. Outperforms AdamW and Muon with lower memory consumption and computational complexity.

Explore Optimizer
Blog Update Dec 2025

Jianlin Su's Blog Collection

Added many profound articles from Jianlin Su (Scientific Spaces) covering optimization theory, Muon, and scaling laws.

Explore Blogs
New Resource Jan 2026

Optimizer Summary Sheet

A systematic academic summary of optimization in deep learning, covering foundational theories to modern adaptive and higher-order methods.

Open Reader

Members

Meet the members behind ScalingOpt. We thank them for their contributions.

Juanxi Tian

Juanxi Tian

Personal Page
Yifeng Liu

Yifeng Liu

Personal Page
Yongliang Wu

Yongliang Wu

Personal Page
Sicheng Feng

Sicheng Feng

Personal Page
Junhan Zhu

Junhan Zhu

Personal Page
Qunzhong Wang

Qunzhong Wang

Personal Page
Yiming Dong

Yiming Dong

Google Scholar
Junjie Wang

Junjie Wang

Google Scholar

Team member information is continuously updated. We welcome email applications for collaboration.

Featured Optimizers

Discover the most powerful and innovative optimization algorithms powering modern AI

SSO

2026

Fully μP-aligned optimization via spectral sphere steepest descent

Second-order

Conda

2025

Column-Normalized Adam for Training LLMs Faster

First-order

Muon

2024

Orthogonal weight updates via Newton-Schulz iteration

Second-order

SOAP

2024

Improving and Stabilizing Shampoo using Adam

Second-order
New Academic Resource

Deep Learning Optimization
Knowledge Summary

Our latest academic synthesis covers the entire spectrum of optimization in deep learning. From foundational gradient descent to the frontiers of second-order adaptive methods.

Why Choose ScalingOpt?

Everything you need to understand, implement, and scale optimization algorithms for modern AI

Extensive Optimizer Library

Explore all optimization algorithms from foundational SGD to cutting-edge Adam-mini and Muon, with detailed implementations and PyTorch code.

Research & Learning Hub

Access research papers, tutorials, and educational content covering optimization theory, implementation guides, and latest developments.

Open Source & Community

Contribute to open-source implementations, join GitHub discussions, and collaborate with researchers worldwide on optimization algorithms.

Join the Optimization Community

Connect with researchers and practitioners exploring efficient AI and optimization algorithms. Discover, learn, and contribute to the future of machine learning optimization.