Deep Learning Optimization Summary Sheet

An academic synthesis of optimization theories, algorithms, and practical strategies in modern deep learning systems.

Systematic Taxonomy

1st Gen: Directional

Foundational methods focusing on gradient direction and noise reduction (SGD, Momentum).

2nd Gen: Adaptive

Methods using historical gradient information for per-parameter learning rates (Adam, RMSProp).

3rd Gen: Higher-order

Modern techniques incorporating curvature information and scaling stability (Second-order, Shampoo).

Key Theoretical Pillars

  • Convergence analysis of non-convex landscapes.
  • Relationship between batch size and generalization.
  • Hessian-based acceleration mechanisms.

Academic Citation

@misc{liu2024summary,
  author={Yifeng Liu},
  title={A Summary Sheet of Optimization in Deep Learning},
  year={2024},
  month={Dec},
  howpublished={\url{https://github.com/lauyikfung/A-Summary-Sheet-of-Optimization-in-Deep-Learning/A_Summary_Sheet_of_Optimization_in_Deep_Learning.pdf}}
}
Smart Reading Tip: Use the scrollbar on the PDF to navigate pages. For a full experience, click "Download PDF".
Author: Yifeng Liu