Scheduler Library
Explore our comprehensive collection of 10+ learning rate schedulers. Find the perfect strategy for your model training.
Why Learning Rate Schedulers Matter
The learning rate is arguably one of the most important hyperparameters in machine learning. While a fixed learning rate can work, it often leads to suboptimal results. Learning rate schedulers offer a more dynamic approach by automatically adjusting the learning rate during training.
Fixed learning rates create several problems:
- Learning rate too high: The model oscillates around the optimal point, unable to settle into the minimum
- Learning rate too low: Training progresses extremely slowly, wasting computational resources and potentially getting trapped in poor local minima
- No adaptation: Cannot adjust to different phases of training
Learning rate schedulers address these issues by dynamically adjusting your model's learning rate based on training progress or performance metrics. Early in training, when weights are far from optimal, a higher learning rate helps make rapid progress. As the model approaches convergence, a lower learning rate allows for fine-tuning and prevents overshooting the minimum.
Reference: A Gentle Introduction to Learning Rate Schedulers
Choosing the Right Scheduler
When you know your training phases
→ StepLR
If you understand when your model should transition from exploration to fine-tuning, StepLR gives you explicit control over these phases.
When you want smooth decay
→ ExponentialLR
For stable, continuous reduction without sudden changes that might disrupt training momentum.
When you want to escape local minima
→ CosineAnnealingLR
The cosine pattern provides natural exploration phases that can help the model find better solutions.
When you're uncertain about scheduling
→ ReduceLROnPlateau
This adaptive approach responds to actual training progress, making it excellent when you're unsure about optimal timing. Recommended as a default choice.
For Schedule-Free optimization
→ Schedule-Free
Eschews learning rate schedules entirely in favor of a novel iterate averaging scheme, achieving SOTA performance without the need to specify a stopping step T.
Tip: Start with ReduceLROnPlateau for most problems, then experiment with others based on your specific needs.
A chronological journey through the history of LR schedulers