Scheduler Library

Explore our comprehensive collection of 10+ learning rate schedulers. Find the perfect strategy for your model training.

Why Learning Rate Schedulers Matter

The learning rate is arguably one of the most important hyperparameters in machine learning. While a fixed learning rate can work, it often leads to suboptimal results. Learning rate schedulers offer a more dynamic approach by automatically adjusting the learning rate during training.

Fixed learning rates create several problems:

  • Learning rate too high: The model oscillates around the optimal point, unable to settle into the minimum
  • Learning rate too low: Training progresses extremely slowly, wasting computational resources and potentially getting trapped in poor local minima
  • No adaptation: Cannot adjust to different phases of training

Learning rate schedulers address these issues by dynamically adjusting your model's learning rate based on training progress or performance metrics. Early in training, when weights are far from optimal, a higher learning rate helps make rapid progress. As the model approaches convergence, a lower learning rate allows for fine-tuning and prevents overshooting the minimum.

Reference: A Gentle Introduction to Learning Rate Schedulers

Choosing the Right Scheduler

When you know your training phases

StepLR

If you understand when your model should transition from exploration to fine-tuning, StepLR gives you explicit control over these phases.

When you want smooth decay

ExponentialLR

For stable, continuous reduction without sudden changes that might disrupt training momentum.

When you want to escape local minima

CosineAnnealingLR

The cosine pattern provides natural exploration phases that can help the model find better solutions.

When you're uncertain about scheduling

ReduceLROnPlateau

This adaptive approach responds to actual training progress, making it excellent when you're unsure about optimal timing. Recommended as a default choice.

For Schedule-Free optimization

Schedule-Free

Eschews learning rate schedules entirely in favor of a novel iterate averaging scheme, achieving SOTA performance without the need to specify a stopping step T.

Tip: Start with ReduceLROnPlateau for most problems, then experiment with others based on your specific needs.

A chronological journey through the history of LR schedulers

Showing 0 of 0 schedulers