ScalingOpt Optimization at Scale

Discover, compare, and contribute to cutting-edge optimization algorithms designed for large-scale deep learning.

Explore Optimizers Give us a Star on GitHub!

OUR CODEBASE

ScalingOPT

A research-oriented PyTorch codebase for optimizer-centric scaling studies in large language model training. Designed to make optimizer comparisons reproducible, fair, and ergonomically extensible.

30+ Optimizers

Single entrypoint

17 Model Configs

9M to 13B params

3 Data Pipelines

C4, Pile, OpenWebText

Multi-GPU DDP

Native torchrun

View on GitHub All Packages

ScalingOPT

LLM

Architectures

LLaMA 9M–13B GPT-2 124M Qwen3 0.6B–1.7B

Optimizer Categories

AdamW Muon SOAP Shampoo APOLLO MARS Lion Sophia Adam-Mini LAMB GaLore +20 more

Training Modes

Pretrain

SFT / DPO

Evaluation

WHAT'S NEW

Latest News

Recent updates from the ScalingOpt community

New Optimizer Jan 2026

Mano: Restriking Manifold Optimization for LLM Training

A novel manifold optimization method that projects momentum onto tangent space and constrains it on rotational Oblique manifold. Outperforms AdamW and Muon with lower memory consumption and computational complexity.

Explore Optimizer

Blog Update Dec 2025

Jianlin Su's Blog Collection

Added many profound articles from Jianlin Su (Scientific Spaces) covering optimization theory, Muon, and scaling laws.

Explore Blogs

New Resource Jan 2026

Optimizer Summary Sheet

A systematic academic summary of optimization in deep learning, covering foundational theories to modern adaptive and higher-order methods.

Open Reader

Members

Meet the members behind ScalingOpt. We thank them for their contributions.

Team member information is continuously updated. We welcome email applications for collaboration.

Featured Optimizers

Discover the most powerful and innovative optimization algorithms powering modern AI

SSO

2026

Fully μP-aligned optimization via spectral sphere steepest descent

Second-order

Conda

2025

Column-Normalized Adam for Training LLMs Faster

First-order

Muon

2024

Orthogonal weight updates via Newton-Schulz iteration

Second-order

SOAP

2024

Improving and Stabilizing Shampoo using Adam

Second-order

View All Optimizers

Industry-Optimized Implementations

Production-ready libraries with improved distributed support and hardware optimization

🤗

Hugging Face

Optimizers integrated into Transformers (AdamW, AdaFactor) with native support for distributed training and mixed precision.

View Documentation

Meta Research

Cutting-edge optimization algorithms like Distributed Shampoo developed by Meta for large-scale model training.

View Repository

NVIDIA TensorRT

Advanced model optimization toolkit for NVIDIA GPUs, focusing on quantization and inference acceleration.

View Toolkit

New Academic Resource

Deep Learning Optimization
Knowledge Summary

Our latest academic synthesis covers the entire spectrum of optimization in deep learning. From foundational gradient descent to the frontiers of second-order adaptive methods.

Open Reader

Why Choose ScalingOpt?

Everything you need to understand, implement, and scale optimization algorithms for modern AI

Extensive Optimizer Library

Explore all optimization algorithms from foundational SGD to cutting-edge Adam-mini and Muon, with detailed implementations and PyTorch code.

Research & Learning Hub

Access research papers, tutorials, and educational content covering optimization theory, implementation guides, and latest developments.

Open Source & Community

Contribute to open-source implementations, join GitHub discussions, and collaborate with researchers worldwide on optimization algorithms.

Join the Optimization Community

Connect with researchers and practitioners exploring efficient AI and optimization algorithms. Discover, learn, and contribute to the future of machine learning optimization.

Browse Optimizers Join Community

ScalingOpt Optimization at Scale

ScalingOPT

Latest News

Mano: Restriking Manifold Optimization for LLM Training

Jianlin Su's Blog Collection

Optimizer Summary Sheet

Members

Juanxi Tian

Yufei Gu

Yifeng Liu

Rui Pan

Yongliang Wu

Sicheng Feng

Junhan Zhu

Qunzhong Wang

Yiming Dong

Junjie Wang

Featured Optimizers

SSO

Conda

Muon

SOAP

Industry-Optimized Implementations

Hugging Face

Meta Research

NVIDIA TensorRT

Deep Learning Optimization Knowledge Summary

Why Choose ScalingOpt?

Extensive Optimizer Library

Research & Learning Hub

Open Source & Community

Join the Optimization Community

The Inverted Pyramid of the Optimization Field

Infrastructure

Core Objective

Technological Stack

Model Architecture

Core Objective

Architectural Patterns

Training Decorators

Core Objective

Refinement Techniques

Hyperparameters

Core Objective

Control Variables

Optimizer

Core Objective

Algorithm Zoo

Foundational Rules

Core Objective

The Bedrock

Deep Learning Optimization
Knowledge Summary