Optimizer Library

Explore our comprehensive collection of 50+ optimization algorithms. Find the perfect optimizer for your large-scale machine learning project.

A chronological journey through the history of optimization

Understanding the anatomy and building blocks of modern optimization algorithms. Most optimizers follow this four-step pipeline.

The backward pass calculates gradients (\(\nabla \mathcal{L}\)) for all parameters.

loss.backward()

Refining raw gradients using momentum or variance reduction to smooth updates.

# Momentum update
m_t = beta1 * m_{t-1} + (1-beta1) * g_t

Adjusting step size dynamically (e.g., adaptive methods like Adam) or via schedules.

# Adaptive scaling
step_size = lr / (sqrt(v_t) + eps)

Applying the calculated step to update model weights.

p.data.add_(-step_size * m_t)

Key Trend

Optimizer memory efficiency is improved by reducing the state memory through factorization and gradient projection.

Performance

Acceleration is achieved through adaptive update mechanisms that are guided by gradient direction and magnitude.

Scale

Stability is maintained by addressing numerical instability under extreme training conditions, such as large batch sizes or low precision.

Emerging

Applying orthogonal transformations to weight updates for better feature learning and training stability.

Showing 70 of 70 optimizers

Evolution Trajectory