Deep Learning Optimization Summary Sheet
An academic synthesis of optimization theories, algorithms, and practical strategies in modern deep learning systems.
Systematic Taxonomy
1st Gen: Directional
Foundational methods focusing on gradient direction and noise reduction (SGD, Momentum).
2nd Gen: Adaptive
Methods using historical gradient information for per-parameter learning rates (Adam, RMSProp).
3rd Gen: Higher-order
Modern techniques incorporating curvature information and scaling stability (Second-order, Shampoo).
Key Theoretical Pillars
- Convergence analysis of non-convex landscapes.
- Relationship between batch size and generalization.
- Hessian-based acceleration mechanisms.
Academic Citation
@misc{liu2024summary,
author={Yifeng Liu},
title={A Summary Sheet of Optimization in Deep Learning},
year={2024},
month={Dec},
howpublished={\url{https://github.com/lauyikfung/A-Summary-Sheet-of-Optimization-in-Deep-Learning/A_Summary_Sheet_of_Optimization_in_Deep_Learning.pdf}}
}
Related Resources
Smart Reading Tip: Use the scrollbar on the PDF to navigate pages. For a full experience, click "Download PDF".
Author: Yifeng Liu