Context and Motivation

In the "Approximate Estimation" section of "Higher-order MuP: More Concise Yet More Sophisticated Spectral Condition Scaling", we "prepaid" a conclusion: "For an $n\times m$ random matrix following a standard normal distribution, its spectral norm is approximately $\sqrt{n}+\sqrt{m}$."

Article Objective

This article supplements the discussion of this conclusion, providing a fast estimation method for the spectral norms of random matrices.

Random Matrix Theory#

Problem Formulation

Consider a random matrix $\boldsymbol{W}\in\mathbb{R}^{n\times m}$, where each element is independently and identically sampled from the standard normal distribution $\mathcal{N}(0,1)$. We aim to estimate the spectral norm of $\boldsymbol{W}$, i.e., the largest singular value, with $\mathbb{E}[\Vert\boldsymbol{W}\Vert_2]$ as the final estimation result.

Theoretical Background

First, it should be noted that the analysis of random matrix behavior has formed a specialized branch. For estimating the spectral norms of normal matrices, relevant keywords include the "Marchenko–Pastur distribution", "Bai-Yin law", "Gordon's theorem", etc., which contain detailed estimation results regarding singular value distributions. However, we do not intend to introduce these contents here. Instead, we introduce a method to quickly obtain $\sqrt{n}+\sqrt{m}$.

This method is simplified by the author based on "Section 5.3.1" of "Introduction to the non-asymptotic analysis of random matrices". It should actually be considered merely a "popular science" explanation, allowing readers to quickly understand the origin of $\sqrt{n}+\sqrt{m}$. Rigorous proof requires supplementing many tedious and monotonous details, which we will not expand upon here.

Spectral Norm Estimation#

Estimation Methodology

Our starting point is the identity:

(1) \[ \Vert\boldsymbol{W}\Vert_2 = \max_{\Vert \boldsymbol{u}\Vert=1, \Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} \]

where $\boldsymbol{u}\in\mathbb{R}^n,\boldsymbol{v}\in\mathbb{R}^m$. Direct computation of $\Vert\boldsymbol{W}\Vert_2$ is usually not easy, so it is natural to think of finding some approximation. We consider the following two "half-done" expressions:

(2) \[ \max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\qquad\qquad \max_{\Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} \]

Intuitively, compared to Equation (1), both expressions above have only completed half the work. We make a bold assumption: their results are also half of the complete result. Thus, by adding them together, we consider it a good approximation of the final result:

(3) \[ \Vert\boldsymbol{W}\Vert_2 = \max_{\Vert \boldsymbol{u}\Vert=1, \Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} \approx \max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} + \max_{\Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} = \Vert \boldsymbol{W}\boldsymbol{v}\Vert + \Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert \]

In other words, we sample $\boldsymbol{u},\boldsymbol{v}$ from the unit hyperspheres of $\mathbb{R}^n,\mathbb{R}^m$ respectively, and then use the above equation to obtain an approximation of $\boldsymbol{W}$'s spectral norm.

Heuristic Justification

This approximation can be understood intuitively: When optimizing over both $\boldsymbol{u}$ and $\boldsymbol{v}$ simultaneously, the interaction between them is complex. By separating the optimization into two independent steps and summing their results, we capture the contributions from both dimensions. The assumption that each half contributes approximately equally is reasonable for isotropic random matrices where row and column spaces have similar statistical properties.

Expectation Calculation#

Statistical Analysis

With this approximation, we can compute:

(4) \[ \mathbb{E}[\Vert\boldsymbol{W}\Vert_2]\approx\mathbb{E}[\Vert \boldsymbol{W}\boldsymbol{v}\Vert] + \mathbb{E}[\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert] \approx \sqrt{\mathbb{E}[\Vert \boldsymbol{W}\boldsymbol{v}\Vert^2]} + \sqrt{\mathbb{E}[\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert^2]} \]

where:

(5) \[ \mathbb{E}[\Vert \boldsymbol{W}\boldsymbol{v}\Vert^2] = \mathbb{E}[ \boldsymbol{v}^{\top}\boldsymbol{W}^{\top}\boldsymbol{W}\boldsymbol{v}] = \boldsymbol{v}^{\top}\mathbb{E}[\boldsymbol{W}^{\top}\boldsymbol{W}]\boldsymbol{v} = \boldsymbol{v}^{\top}(n\boldsymbol{I}_m)\boldsymbol{v} = n \]

Similarly, $\mathbb{E}[\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert^2]=m$, so:

(6) \[ \mathbb{E}[\Vert\boldsymbol{W}\Vert_2]\approx\sqrt{n} + \sqrt{m} \]
Accuracy Discussion

This result is actually quite accurate (which can be verified through simulation experiments). Specifically, if $n=ka,m=kb$, where $a,b$ are constants, then:

(7) \[ \lim_{k\to\infty} \frac{\Vert\boldsymbol{W}\Vert_2}{\sqrt{n} + \sqrt{m}} = 1,\qquad \boldsymbol{W}\sim\mathcal{N}(0,1) \]

The reason for this accuracy is that we "cheated"—the crucial Equation (3) essentially was deduced backward from the known correct answer. Besides Equation (3), the only conditions we used were $\mathbb{E}[\boldsymbol{W}^{\top}\boldsymbol{W}]=n\boldsymbol{I}_m$ and $\mathbb{E}[\boldsymbol{W}\boldsymbol{W}^{\top}]=m\boldsymbol{I}_n$. Therefore, it can be considered that the above approximation holds for any zero-mean, unit-variance distribution.

Minimal Singular Value#

Extension to Minimal Singular Values

The spectral norm is the largest singular value. In fact, we can use the same idea to estimate the minimal singular value. Of course, "minimal" here needs to be defined more strictly. Assuming $n\geq m$, the minimal singular value here refers to the $m$-th singular value of $\boldsymbol{W}$ in descending order, which equals:

(8) \[ \sigma_{\min}(\boldsymbol{W}) = \min_{\Vert \boldsymbol{v}\Vert=1}\max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} \]

Note that the positions and objects of $\min$ and $\max$ here cannot be exchanged. Similarly, we consider the sum of the two "half-done" expressions from separate $\min$ and $\max$ operations as its approximation:

(9) \[ \sigma_{\min}(\boldsymbol{W}) = \min_{\Vert \boldsymbol{v}\Vert=1}\max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\approx \min_{\Vert \boldsymbol{v}\Vert=1}\boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} + \max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} = -\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert + \Vert \boldsymbol{W}\boldsymbol{v}\Vert \]

The subsequent steps are the same as before, resulting in $\mathbb{E}[\sigma_{\min}(\boldsymbol{W})]\approx\sqrt{n}-\sqrt{m}$.

Interpretative Insights

The $\sqrt{n}-\sqrt{m}$ approximation for the minimal singular value reveals interesting dimensional effects in random matrices. For tall matrices ($n \gg m$), the minimal singular value grows approximately as $\sqrt{n}$, indicating better conditioning. For square matrices ($n \approx m$), the minimal singular value becomes small, reflecting the well-known fact that random square matrices become increasingly ill-conditioned as dimensions grow.

Concluding Remarks#

Summary and Limitations

This article provides a fast approach for estimating the spectral norms of random matrices, or more strictly speaking, a popular science-style, heuristic explanation rather than a rigorous, accurate derivation. There is potential for rigorization, but it requires supplementing many theoretical details, all of which have been skipped here.

Key contributions:

  1. Heuristic Derivation: Presented an intuitive, accessible derivation of the $\sqrt{n}+\sqrt{m}$ approximation for Gaussian random matrix spectral norms.
  2. Estimation Framework: Developed a decomposition approach that separates the optimization over row and column spaces, providing conceptual clarity.
  3. Extension to Minimal Singular Values: Showed how the same heuristic approach yields the $\sqrt{n}-\sqrt{m}$ approximation for minimal singular values.
  4. Practical Relevance: The approximations are remarkably accurate in practice and apply to any zero-mean, unit-variance distribution, not just Gaussian.
  5. Pedagogical Value: Provides an accessible entry point to random matrix theory concepts without requiring advanced mathematical machinery.

While not a substitute for rigorous random matrix theory, this heuristic approach offers valuable intuition and practical estimation tools for researchers and practitioners working with random matrices in machine learning, statistics, and numerical analysis.

Citation Information

Original Article: Su Jianlin. Fast Estimation of Spectral Norms for Random Matrices. Scientific Spaces.

How to cite this translation:

Su, J. Fast Estimation of Spectral Norms for Random Matrices [Translated by ScalingOpt Team]. Scientific Spaces.

BibTeX:

@article{su2025fast_spectral_estimation, title = {Fast Estimation of Spectral Norms for Random Matrices}, author = {Su, Jianlin}, journal = {Scientific Spaces}, year = {2025}, url = {https://kexue.fm/archives/10572}, note = {Translated by ScalingOpt Team} }