线性代数-SVD-奇异值分解的早期历史二

JAY.LIN 收录于未分类

2025-09-23 约 17629 字预计阅读 36 分钟

https://bing.ee123.net/img/rand?artid=151970576

线性代数 · SVD | 奇异值分解的早期历史（二）

注：本文为 “线性代数 · SVD” 相关英文引文，机翻未校。
如有内容异常，请看原文。
csdn 篇幅字数限制，分为两篇，此为第二篇。

线性代数 · SVD | 奇异值分解的早期历史（一）-CSDN博客

6. Weyl [64, 1912]

6. 外尔的研究 [64, 1912]

An important application of the approximation theorem is the determination of the rank of a matrix in the presence of error. If A A A is of rank k k k and A ~

A + E \tilde{A} = A + E A~=A+E, then the last n − k n - k n−k singular values of A ~ \tilde{A} A~ satisfy
近似定理的一个重要应用是在存在误差的情况下确定矩阵的秩。若矩阵 A A A 的秩为 k k k，且 A ~

A + E \tilde{A} = A + E A~=A+E（其中 E E E 为误差矩阵），则 A ~ \tilde{A} A~ 的后 n − k n - k n−k 个奇异值满足

( 6.1 ) σ ~ k + 1 2 + ⋯ + σ ~ n 2 ≤ ∥ E ∥ 2 , (6.1) \quad \tilde{\sigma}{k+1}^{2} + \cdots + \tilde{\sigma}{n}^{2} \leq | E |^{2}, (6.1)σ~~k+12+⋯+σ~~n2≤∥E∥2,
so that the defect in rank of A A A will be manifest in the size of its trailing singular values.
因此，矩阵 A A A 的秩亏损情况可通过其（近似矩阵 A ~ \tilde{A} A~ 的）后几个奇异值的大小体现出来。

The inequality (6.1) is actually a perturbation theorem for the zero singular values of a matrix. Weyl’s contribution to the theory of the singular value decomposition was to develop a general perturbation theory and use it to give an elegant proof of the approximation theorem. Although Weyl treated integral equations with symmetric kernels, in a footnote on Schmidt’s contribution he states, “E. Schmidt’s theorem, by the way, treats arbitrary (unsymmetric) kernels; however, our proof can also be applied directly to this more general case.” Since here we are concerned with the more general case, we will paraphrase Weyl’s development as he might have written it for unsymmetric matrices.
不等式(6.1) 本质上是矩阵零奇异值的扰动定理。外尔对奇异值分解理论的贡献在于，他建立了一套通用的扰动理论，并利用该理论为近似定理提供了简洁优雅的证明。尽管外尔的研究对象是具有对称核的积分方程，但他在关于施密特贡献的注释中指出：“顺便提一句，埃哈德·施密特的定理适用于任意（非对称）核；而我们提出的证明方法同样可直接应用于这一更一般的情形。”由于本文关注的正是这一更一般的情形，下文将借鉴外尔的研究思路，模拟他可能会如何针对非对称矩阵展开推导。

The location of singular values

奇异值的定位

The heart of Weyl’s development is a lemma concerning the singular values of a perturbed matrix. Specifically, if B k

X Y T B_k = XY^T Bk=XYT, where X X X and Y Y Y have k k k columns (i.e., rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k), then
奇异值的位置。Weyl发展的关键在于一个关于扰动矩阵奇异值的引理。具体来说，如果 B k

X Y T B_k = XY^T Bk=XYT，其中 X X X 和 Y Y Y 有 k k k 列（即， rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k），则

σ 1 ( A − B k ) ≥ σ k + 1 ( A ) , \sigma_1(A - B_k) \geq \sigma_{k+1}(A), σ1(A−Bk)≥σk+1(A),

where σ i ( ⋅ ) \sigma_i(\cdot) σi(⋅) denotes the i i ith singular value of its argument.
其中 σ i ( ⋅ ) \sigma_i(\cdot) σi(⋅) 表示其参数的第 i i i 个奇异值。

The proof is simple. Since Y Y Y has k k k columns, there is a linear combination
证明很简单。由于 Y Y Y 有 k k k 列，存在 V V V 的前 k + 1 k+1 k+1 列（来自 A A A 的奇异值分解）的线性组合

v

γ 1 v 1 + γ 2 v 2 + ⋯ + γ k + 1 v k + 1 v = \gamma_1 v_1 + \gamma_2 v_2 + \cdots + \gamma_{k+1} v_{k+1} v=γ1v1+γ2v2+⋯+γk+1vk+1

of the first k + 1 k+1 k+1 columns of V V V (from the singular value decomposition of A A A) such that Y T v

0 Y^T v = 0 YTv=0. Without loss of generality we may assume that ∥ v ∥

1 |v| = 1 ∥v∥=1, or equivalently that γ 1 2 + ⋯ + γ k + 1 2

1 \gamma_1^2 + \cdots + \gamma_{k+1}^2 = 1 γ12+⋯+γk+12=1. It follows that
使得 Y T v

0 Y^T v = 0 YTv=0。不失一般性，我们可以假设 ∥ v ∥

1 |v| = 1 ∥v∥=1，或者等价地 γ 1 2 + ⋯ + γ k + 1 2

1 \gamma_1^2 + \cdots + \gamma_{k+1}^2 = 1 γ12+⋯+γk+12=1。由此可得

σ 1 2 ( A − B ) ≥ v T ( A − B ) T ( A − B ) v

v T A T A v

γ 1 2 σ 1 2 + γ 2 2 σ 2 2 + ⋯ + γ k + 1 2 σ k + 1 2 ≥ σ k + 1 2 . \begin{align*} \sigma _{1}^{2}(A-B) & \ge {{v}^{T}}{{(A-B)}^{T}}(A-B)v \ & ={{v}^{T}}{{A}^{T}}Av \ & =\gamma _{1}^{2}\sigma _{1}^{2}+\gamma _{2}^{2}\sigma _{2}^{2}+\cdots +\gamma _{k+1}^{2}\sigma _{k+1}^{2} \ & \ge \sigma _{k+1}^{2}. \end{align*} σ12(A−B)≥vT(A−B)T(A−B)v=vTATAv=γ12σ12+γ22σ22+⋯+γk+12σk+12≥σk+12.

Weyl then proves two theorems. The first states that if A

A ′ + A ′ ′ A = A’ + A’’ A=A′+A′′, then
随后，外尔证明了两个定理。第一个定理指出，若 A

A ′ + A ′ ′ A = A’ + A’’ A=A′+A′′，则

( 6.3 ) σ i + j − 1 ( A ) ≤ σ i ( A ′ ) + σ j ( A ′ ′ ) , (6.3) \quad \sigma_{i+j-1}(A) \leq \sigma_{i}(A’) + \sigma_{j}(A’’), (6.3)σi+j−1(A)≤σi(A′)+σj(A′′),
where the σ i ( A ′ ) \sigma_{i}(A’) σi(A′) and σ i ( A ′ ′ ) \sigma_{i}(A’’) σi(A′′) are the singular values of A ′ A’ A′ and A ′ ′ A’’ A′′ arranged in descending order of magnitude. Weyl begins by establishing (6.3) for i

j

1 i = j = 1 i=j=1:
其中， σ i ( A ′ ) \sigma_{i}(A’) σi(A′) 和 σ i ( A ′ ′ ) \sigma_{i}(A’’) σi(A′′) 分别表示矩阵 A ′ A’ A′ 和 A ′ ′ A’’ A′′ 按从大到小顺序排列的第 i i i 个奇异值。外尔首先证明了 i

j

1 i = j = 1 i=j=1 时式 (6.3) 成立：

σ 1 ( A )

u 1 T A v 1

u 1 T A ′ v 1 + u 1 T A ′ ′ v 1 ≤ σ 1 ( A ′ ) + σ 1 ( A ′ ′ ) . \sigma_{1}(A) = u_{1}^{T} A v_{1} = u_{1}^{T} A’ v_{1} + u_{1}^{T} A’’ v_{1} \leq \sigma_{1}(A’) + \sigma_{1}(A’’). σ1(A)=u1TAv1=u1TA′v1+u1TA′′v1≤σ1(A′)+σ1(A′′).

Here, u 1 u_1 u1 and v 1 v_1 v1 are the first columns of the unitary matrices in the singular value decomposition of A A A.
在这里， u 1 u_1 u1 和 v 1 v_1 v1 是矩阵 A A A 奇异值分解中酉矩阵的第一列。

To establish the result in general, let A i − 1 ′

∑ m

1 i − 1 σ m ( A ′ ) u m ′ v m ′ T A_{i-1}’ = \sum_{m=1}^{i-1} \sigma_{m}(A’) u_m’ v_m’^T Ai−1′=∑m=1i−1σm(A′)um′vm′T and A j − 1 ′ ′

∑ m

1 j − 1 σ m ( A ′ ′ ) u m ′ ′ v m ′ ′ T A_{j-1}’’ = \sum_{m=1}^{j-1} \sigma_{m}(A’’) u_m’’ v_m’’^T Aj−1′′=∑m=1j−1σm(A′′)um′′vm′′T be formed in analogy with (5.2). Then σ 1 ( A ′ − A i − 1 ′ )

σ i ( A ′ ) \sigma_{1}(A’ - A_{i-1}’) = \sigma_{i}(A’) σ1(A′−Ai−1′)=σi(A′) and σ 1 ( A ′ ′ − A j − 1 ′ ′ )

σ j ( A ′ ′ ) \sigma_{1}(A’’ - A_{j-1}’’) = \sigma_{j}(A’’) σ1(A′′−Aj−1′′)=σj(A′′). Moreover, rank ( A i − 1 ′ + A j − 1 ′ ′ ) ≤ ( i − 1 ) + ( j − 1 )

i + j − 2 (A_{i-1}’ + A_{j-1}’’) \leq (i-1) + (j-1) = i+j-2 (Ai−1′+Aj−1′′)≤(i−1)+(j−1)=i+j−2. From these facts and from (6.2) it follows that
为证明该定理在一般情况下成立，参照式 (5.2) 构造矩阵： A i − 1 ′

∑ m

1 i − 1 σ m ( A ′ ) u m ′ v m ′ T A_{i-1}’ = \sum_{m=1}^{i-1} \sigma_{m}(A’) u_m’ v_m’^T Ai−1′=∑m=1i−1σm(A′)um′vm′T， A j − 1 ′ ′

∑ m

1 j − 1 σ m ( A ′ ′ ) u m ′ ′ v m ′ ′ T A_{j-1}’’ = \sum_{m=1}^{j-1} \sigma_{m}(A’’) u_m’’ v_m’’^T Aj−1′′=∑m=1j−1σm(A′′)um′′vm′′T。则有 σ 1 ( A ′ − A i − 1 ′ )

σ i ( A ′ ) \sigma_{1}(A’ - A_{i-1}’) = \sigma_{i}(A’) σ1(A′−Ai−1′)=σi(A′)， σ 1 ( A ′ ′ − A j − 1 ′ ′ )

σ j ( A ′ ′ ) \sigma_{1}(A’’ - A_{j-1}’’) = \sigma_{j}(A’’) σ1(A′′−Aj−1′′)=σj(A′′)，且 rank ( A i − 1 ′ + A j − 1 ′ ′ ) ≤ ( i − 1 ) + ( j − 1 )

i + j − 2 \text{rank}(A_{i-1}’ + A_{j-1}’’) \leq (i-1) + (j-1) = i+j-2 rank(Ai−1′+Aj−1′′)≤(i−1)+(j−1)=i+j−2。结合这些结论与式 (6.2) 可推出：

σ i ( A ′ ) + σ j ( A ′ ′ )

σ 1 ( A ′ − A i − 1 ′ ) + σ 1 ( A ′ ′ − A j − 1 ′ ′ ) ≥ σ 1 ( ( A ′ − A i − 1 ′ ) + ( A ′ ′ − A j − 1 ′ ′ ) )

σ 1 ( A − ( A i − 1 ′ + A j − 1 ′ ′ ) ) ≥ σ ( i + j − 2 ) + 1 ( A )

σ i + j − 1 ( A ) , \begin{aligned} \sigma_{i}(A’) + \sigma_{j}(A’’) &= \sigma_{1}(A’ - A_{i-1}’) + \sigma_{1}(A’’ - A_{j-1}’’) \ &\geq \sigma_{1}\left( (A’ - A_{i-1}’) + (A’’ - A_{j-1}’’) \right) \ &= \sigma_{1}\left( A - (A_{i-1}’ + A_{j-1}’’) \right) \ &\geq \sigma_{(i+j-2)+1}(A) = \sigma_{i+j-1}(A), \end{aligned} σi(A′)+σj(A′′)=σ1(A′−Ai−1′)+σ1(A′′−Aj−1′′)≥σ1((A′−Ai−1′)+(A′′−Aj−1′′))=σ1(A−(Ai−1′+Aj−1′′))≥σ(i+j−2)+1(A)=σi+j−1(A),
which proves the theorem.
从而完成了定理的证明。

The second theorem is really a corollary of the first. Set A ′

A − B k A’ = A - B_k A′=A−Bk and A ′ ′

B k A’’ = B_k A′′=Bk, where, as above, B k B_k Bk has rank ≤ k \leq k ≤k. Since σ 1 ( A ′ ′ )

σ 1 ( B k ) ≤ ∥ B k ∥ \sigma_{1}(A’’) = \sigma_{1}(B_k) \leq | B_k | σ1(A′′)=σ1(Bk)≤∥Bk∥ and σ k + 1 ( A ′ ′ )

0 \sigma_{k+1}(A’’) = 0 σk+1(A′′)=0 (because rank ( B k ) ≤ k (B_k) \leq k (Bk)≤k), we have on setting j

k + 1 j = k+1 j=k+1 in (6.3),
第二个定理实际上是第一个定理的推论。令 A ′

A − B k A’ = A - B_k A′=A−Bk、 A ′ ′

B k A’’ = B_k A′′=Bk（其中 B k B_k Bk 的秩满足 rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k，与前文定义一致）。由于 σ 1 ( A ′ ′ )

σ 1 ( B k ) ≤ ∥ B k ∥ \sigma_{1}(A’’) = \sigma_{1}(B_k) \leq | B_k | σ1(A′′)=σ1(Bk)≤∥Bk∥，且 σ k + 1 ( A ′ ′ )

0 \sigma_{k+1}(A’’) = 0 σk+1(A′′)=0（因 rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k），在式 (6.3) 中令 j

k + 1 j = k+1 j=k+1 可得：

σ i ( A − B k ) ≥ σ k + i ( A ) , i

1 , 2 , … \sigma_{i}(A - B_k) \geq \sigma_{k+i}(A), \quad i = 1, 2, \dots σi(A−Bk)≥σk+i(A),i=1,2,…

As a corollary to this result we obtain
由该结论可进一步推出推论：

∥ A − B k ∥ 2 ≥ σ k + 1 2 ( A ) + ⋯ + σ n 2 ( A ) . {{\left| A-{{B}_{k}} \right|}^{2}}\ge \sigma _{k+1}^{2}(A)+\cdots +\sigma _{n}^{2}(A). ∥A−Bk∥2≥σk+12(A)+⋯+σn2(A).

This inequality is equivalent to (5.3) and thus establishes the approximation theorem.
该不等式与式 (5.3) 等价，由此证明了近似定理。

Discussion

讨论

Weyl did not actually write down the development for unsymmetric kernels, and we remind the reader once again of the advisability of consulting original sources. In particular, since symmetric kernels can have negative eigenvalues as well as positive ones, Weyl wrote down three sequences of inequalities: one for positive eigenvalues, one for negative, and one—corresponding to the inequalities presented here—for the absolute values of the eigenvalues.
需要说明的是，外尔并未实际展开非对称核情形下的推导，因此我们再次建议读者查阅原始文献以获取完整信息。具体而言，由于对称核的特征值既有正值也有负值，外尔在研究中推导了三组不等式：一组针对正特征值，一组针对负特征值，还有一组（与本文呈现的不等式对应）针对特征值的绝对值。

Returning to the perturbation problem that opened this section, if in (6.3) we make the identification A ← A ~ A \leftarrow \tilde{A} A←A~, A ′ ← A A’ \leftarrow A A′←A, A ′ ′ ← E A’’ \leftarrow E A′′←E, and then set j

1 j = 1 j=1, we get
回到本节开篇的扰动问题，在式 (6.3) 中令 A ← A ~ A \leftarrow \tilde{A} A←A~、 A ′ ← A A’ \leftarrow A A′←A、 A ′ ′ ← E A’’ \leftarrow E A′′←E，并取 j

1 j = 1 j=1，可得：

σ ~ i ≤ σ i + ∥ E ∥ 2 , \tilde{\sigma}{i} \leq \sigma{i} + | E |_2, σ~i≤σi+∥E∥2,
where ∥ E ∥ 2

σ 1 ( E ) | E |_2 = \sigma_1(E) ∥E∥2=σ1(E) is the spectral norm of E E E. On the other hand, if we make the identifications A ′ ← A ~ A’ \leftarrow \tilde{A} A′←A~ and A ′ ′ ← − E A’’ \leftarrow -E A′′←−E, then we get
其中， ∥ E ∥ 2

σ 1 ( E ) | E |_2 = \sigma_1(E) ∥E∥2=σ1(E) 表示矩阵 E E E 的谱范数。另一方面，若令 A ′ ← A ~ A’ \leftarrow \tilde{A} A′←A~、 A ′ ′ ← − E A’’ \leftarrow -E A′′←−E，则可得：

σ i ≤ σ ~ i + ∥ E ∥ 2 . \sigma_{i} \leq \tilde{\sigma}_{i} + | E |_2. σi≤σ~i+∥E∥2.

It follows that
综合以上两式可得：

∣ σ ~ i − σ i ∣ ≤ ∥ E ∥ 2 , i

1 , 2 , … , n . | \tilde{\sigma}{i} - \sigma{i} | \leq | E |_2, \quad i = 1, 2, \dots, n. ∣σ~i−σi∣≤∥E∥2,i=1,2,…,n.

The number ∥ E ∥ 2 | E |_2 ∥E∥2 is called the spectral norm of E E E. Thus Weyl’s result implies that if the singular values of A A A and A ~ \tilde{A} A~ are associated in their natural order, they cannot differ by more than the spectral norm of the perturbation.

∥ E ∥ 2 | E |_2 ∥E∥2 被称为矩阵 E E E 的谱范数。因此，外尔的结论表明：若将矩阵 A A A 与 A ~ \tilde{A} A~ 的奇异值按自然顺序（从大到小）对应，則对应奇异值之间的差值不会超过扰动矩阵 E E E 的谱范数。

7. Envoi

7. 结语

With Weyl’s contribution, the theory of the singular value decomposition can be said to have matured. The subsequent history is one of extensions, new discoveries, and applications. What follows is a brief, selective sketch of these developments yet to come.
随着外尔研究成果的出现，奇异值分解理论可被认为已趋于成熟。此后的研究主要围绕理论拓展、新发现与实际应用展开。下文将有选择地简要介绍这些后续发展。

Extensions

理论拓展

Autonne [2, 1913] extended the decomposition to complex matrices. Eckart and Young [16, 1936], [17, 1939] extended it to rectangular matrices and rediscovered Schmidt’s approximation theorem, which is often (and incorrectly) called the Eckart-Young theorem.
奥托恩（Autonne）在 1913 年的文献 [2] 中将奇异值分解推广到复矩阵情形。埃卡特（Eckart）与杨（Young）在 1936 年的文献 [16] 和 1939 年的文献 [17] 中，将其推广到长方矩阵情形，并重新发现了施密特的近似定理——该定理常被（错误地）称为“埃卡特-杨定理”。

8. Nomenclature 7 ^7 7

8. 术语命名

The term “singular value” seems to have come from the literature on integral equations. A little after the appearance of Schmidt’s paper, Bateman [4, 1908] refers to numbers that are essentially the reciprocals of the eigenvalues of the kernel A ‾ ( s , t ) \underline{A}(s,t) A(s,t) as singular values. Picard [45, 1909] combined Schmidt’s results with Riesz’s theorem on the strong convergence of generalized Fourier series [48, 1907] to establish a necessary and sufficient condition for the existence of solutions of integral equations. In a later paper on the same subject [46, 1910], he notes that for symmetric kernels Schmidt’s eigenvalues are real and in this case (but not in general) he calls them singular values. By 1937, Smithies [53] was referring to singular values of an integral equation in our modern sense of the word. Even at this point, usage had not stabilized. In 1949, Weyl [65] speaks of the “two kinds of eigenvalues of a linear transformation,” and in a 1969 translation of a 1965 Russian treatise on nonselfadjoint operators, Gohberg and Krein [21] refer to the “s-numbers” of an operator. For the term “principal component,” see below.
“奇异值”（singular value）这一术语的起源似乎与积分方程领域的文献相关。在施密特论文发表后不久，贝特曼（Bateman）在 1908 年的文献 [4] 中，将核 A ‾ ( s , t ) \underline{A}(s,t) A(s,t) 特征值的倒数（本质上）称为“奇异值”。皮卡德（Picard）在 1909 年的文献 [45] 中，将施密特的研究成果与里斯（Riesz）关于广义傅里叶级数强收敛的定理（1907 年文献 [48]）相结合，建立了积分方程解存在的充要条件。在后续一篇关于同一主题的论文（1910 年文献 [46]）中，他指出：对于对称核，施密特定义的特征值为实值，且仅在这种情形下（而非一般情形），他将其称为“奇异值”。到 1937 年，史密斯（Smithies）在文献 [53] 中使用的“积分方程奇异值”一词，已与我们现在对“奇异值”的定义一致。即便如此，该术语的使用仍未完全统一：1949 年，外尔在文献 [65] 中仍将其称为“线性变换的两类特征值”；在 1969 年翻译的一本 1965 年苏联关于非自伴算子的专著中，戈德堡（Gohberg）与克赖因（Krein）在文献 [21] 中将其称为算子的“s-数”（s-numbers）。关于“主成分”（principal component）这一术语的由来，详见下文。

7 ^7 7Parts of this passage were taken from [55, p. 35]
⁷本文的部分内容取自[55, 第35页]

Although it is not, strictly speaking, a matrix decomposition, the Moore-Penrose pseudoinverse [41, 1920], [44, 1955] can be calculated from the singular value decomposition of a matrix as follows. Suppose that the first k k k singular values of A A A are nonzero while the last n − k n - k n−k are zero, and set ∑ †

diag ( σ 1 − 1 , … , σ k − 1 , 0 , … , 0 ) \sum^\dagger = \text{diag}(\sigma_1^{-1}, \dots, \sigma_k^{-1}, 0, \dots, 0) ∑†=diag(σ1−1,…,σk−1,0,…,0). Then the pseudoinverse of A A A is
尽管严格来说，摩尔-彭罗斯伪逆（Moore-Penrose pseudoinverse，1920 年文献 [41]、1955 年文献 [44]）并非矩阵分解，但它可通过矩阵的奇异值分解计算得到：假设矩阵 A A A 的前 k k k 个奇异值非零，后 n − k n - k n−k 个奇异值为零，定义 ∑ †

diag ( σ 1 − 1 , … , σ k − 1 , 0 , … , 0 ) \sum^\dagger = \text{diag}(\sigma_1^{-1}, \dots, \sigma_k^{-1}, 0, \dots, 0) ∑†=diag(σ1−1,…,σk−1,0,…,0)，则矩阵 A A A 的伪逆为

A †

V ∑ † U T . A^\dagger = V \sum^\dagger U^T. A†=V∑†UT.

Unitarily invariant norms

酉不变范数

A matrix norm ∥ ⋅ ∥ u | \cdot |_u ∥⋅∥u is unitarily invariant if ∥ U A V ∥ u

∥ A ∥ u | U A V |_u = | A |_u ∥UAV∥u=∥A∥u for all unitary matrices U U U and V V V. A vector norm ∥ ⋅ ∥ g | \cdot |_g ∥⋅∥g is a symmetric gauge function if ∥ P x ∥ g

∥ x ∥ g | P x |_g = | x |_g ∥Px∥g=∥x∥g for any permutation matrix P P P and ∥ ∣ x ∣ ∥ g

∥ x ∥ g | |x| |_g = | x |_g ∥∣x∣∥g=∥x∥g (where ∣ x ∣ |x| ∣x∣ denotes the vector of absolute values of the components of x x x). Von Neumann [61, 1937] showed that to any unitarily invariant norm ∥ ⋅ ∥ u | \cdot |_u ∥⋅∥u there corresponds a symmetric gauge function ∥ ⋅ ∥ g | \cdot |_g ∥⋅∥g such that ∥ A ∥ u

∥ ( σ 1 , … , σ n ) T ∥ g | A |_u = | (\sigma_1, \dots, \sigma_n)^T |_g ∥A∥u=∥(σ1,…,σn)T∥g; i.e., a unitarily invariant norm is a symmetric gauge function of the singular values of its argument.
若对任意酉矩阵 U U U 和 V V V，均有 ∥ U A V ∥ u

∥ A ∥ u | U A V |_u = | A |_u ∥UAV∥u=∥A∥u，则称矩阵范数 ∥ ⋅ ∥ u | \cdot |_u ∥⋅∥u 为酉不变范数。若对任意置换矩阵 P P P 和向量 x x x，均有 ∥ P x ∥ g

∥ x ∥ g | P x |_g = | x |_g ∥Px∥g=∥x∥g，且 ∥ ∣ x ∣ ∥ g

∥ x ∥ g | |x| |_g = | x |_g ∥∣x∣∥g=∥x∥g（其中 ∣ x ∣ |x| ∣x∣ 表示由 x x x 各分量绝对值构成的向量），则称向量范数 ∥ ⋅ ∥ g | \cdot |_g ∥⋅∥g 为对称规范函数。冯·诺依曼（Von Neumann）在 1937 年的文献 [61] 中证明：对任意酉不变范数 ∥ ⋅ ∥ u | \cdot |_u ∥⋅∥u，均存在对应的对称规范函数 ∥ ⋅ ∥ g | \cdot |_g ∥⋅∥g，使得 ∥ A ∥ u

∥ ( σ 1 , … , σ n ) T ∥ g | A |_u = | (\sigma_1, \dots, \sigma_n)^T |_g ∥A∥u=∥(σ1,…,σn)T∥g；也就是说，酉不变范数可表示为其作用矩阵奇异值的对称规范函数。

Approximation theorems

近似定理

Schmidt’s approximation theorem has been generalized in a number of directions. Mirsky [40, 1960] showed that A k A_k Ak of (5.2) is a minimizing matrix in any unitarily invariant norm. The case where further restrictions are imposed on the minimizing matrix are treated in [12], [22], and [47].
施密特的近似定理已在多个方向上得到推广。米尔斯基（Mirsky）在 1960 年的文献 [40] 中证明：式 (5.2) 定义的 A k A_k Ak 在任意酉不变范数下均为最优近似矩阵。关于对最优近似矩阵施加额外约束的情形，可参见文献 [12]、[22] 和 [47]。

Given matrices A A A and B B B, the Procrustes problem, which arises in the statistical method of factor analysis, is that of determining a unitary matrix Q Q Q such that ∥ A − B Q ∥ |A - BQ| ∥A−BQ∥ is minimized (see [29, 1962]). Green [25, 1952] and Schoneman [51, 1966] showed that if U T A T B V

Σ U^T A^T B V = \Sigma UTATBV=Σ is the singular value decomposition of A T B A^T B ATB then the minimizing matrix is Q

V U T Q = V U^T Q=VUT. Rao [47, 1980] considers the more general problem of minimizing ∥ P A − B Q ∥ |P A - B Q| ∥PA−BQ∥, where P P P and Q Q Q are orthogonal.
给定矩阵 A A A 和 B B B，普罗克拉斯提斯（Procrustes）问题源于因子分析这一统计方法，该问题旨在确定一个酉矩阵 Q Q Q，使得 ∥ A − B Q ∥ |A - BQ| ∥A−BQ∥ 达到最小（参见文献 [29, 1962]）。Green [25, 1952] 和 Schoneman [51, 1966] 证明：若 U T A T B V

Σ U^T A^T B V = \Sigma UTATBV=Σ 是 A T B A^T B ATB 的奇异值分解，则使该范数最小的矩阵为 Q

V U T Q = V U^T Q=VUT。Rao [47, 1980] 则研究了更一般的问题，即最小化 ∥ P A − B Q ∥ |P A - B Q| ∥PA−BQ∥，其中 P P P 和 Q Q Q 均为正交矩阵。

Principal components.

主成分

An alternative to factor analysis is the principal component analysis of Hotelling [27, 1933]. Specifically, if x T x^T xT is a random variable with mean zero and common dispersion matrix D D D, and D

V Σ V T D = V \Sigma V^T D=VΣVT is the eigenvalue-eigenvector decomposition of D D D, then the components of x T V x^T V xTV are uncorrelated with variances σ i \sigma_i σi. Hotelling called the transformed variables “the principal components of variance” of x T x^T xT. If the rows of X X X consist of independent copies of x T x^T xT, then the expectation of X T X X^T X XTX is proportional to Σ \Sigma Σ. It follows that the matrix V ^ \hat{V} V^ obtained from the singular value decomposition of X X X is an estimate of V V V.
主成分。因子分析的一种替代方法是 Hotelling [27, 1933] 提出的主成分分析。具体而言，若 x T x^T xT 是一个均值为零、公共散布矩阵为 D D D 的随机变量，且 D

V Σ V T D = V \Sigma V^T D=VΣVT 是 D D D 的特征值 - 特征向量分解，则 x T V x^T V xTV 的各分量互不相关，其方差为 σ i \sigma_i σi。Hotelling 将这些变换后的变量称为 x T x^T xT 的“方差主成分”。若矩阵 X X X 的各行是 x T x^T xT 的独立样本，则 X T X X^T X XTX 的期望与 Σ \Sigma Σ 成比例。由此可推出，通过 X X X 的奇异值分解得到的矩阵 V ^ \hat{V} V^ 是 V V V 的一个估计。

Hotelling [28, 1936] also introduced canonical correlations between two sets of random variables that bear the same relation to the generalized singular value decomposition as his principal components bear to the singular value decomposition.
Hotelling [28, 1936] 还提出了两组随机变量之间的典型相关（canonical correlation）。这种典型相关与广义奇异值分解的关系，等同于其主成分与奇异值分解的关系。

Inequalities involving singular values. Just as Schmidt did not have the last word on approximation theorems, Weyl was not the last to work on inequalities involving singular values. The subject is too voluminous to treat here, and we refer the reader to the excellent survey with references in [26, Chap. 3]. However, mention should be made of a line of research initiated by Weyl [65, 1949] relating the singular values and eigenvalues of a matrix.
涉及奇异值的不等式。正如 Schmidt 并非在逼近定理方面做出最终定论的学者，Weyl 也不是最后一位研究涉及奇异值不等式的学者。该主题内容过于庞杂，无法在此详尽阐述，建议读者参考 [26, 第 3 章] 中包含参考文献的出色综述。不过，值得一提的是 Weyl [65, 1949] 开创的一个研究方向，该方向探讨了矩阵奇异值与特征值之间的关系。

Computational methods. The singular value decomposition was introduced into numerical analysis by Golub and Kahan [23, 1965], who proposed a computational algorithm. However, it was Golub [24, 1970] who gave the algorithm that has been the workhorse of the past two decades. Recently, Demmel and Kahan [13, 1990] have proposed an interesting alternative.
计算方法。Golub 和 Kahan [23, 1965] 将奇异值分解引入数值分析领域，并提出了相应的计算算法。然而，过去二十年里广泛应用的算法是由 Golub [24, 1970] 提出的。最近，Demmel 和 Kahan [13, 1990] 提出了一种颇具新意的替代算法。

Sources. For short bibliographies of the principles see the Dictionary of Scientific Biography [6], and particularly the articles [6], [14], [15], [42], and [56]. The nearest thing to a systematic survey of the development of matrix decompositions is the chapter on determinants and matrices in Kline’s Mathematical Thought from Ancient to Modern Times [35, Chap. 33]. Mac Duffee’s book, The Theory of Matrices [39], is a gold mine of references to the older literature.
资料来源。关于相关原理的简要参考文献，可参见《科学传记词典》（Dictionary of Scientific Biography）[6]，尤其可参考其中的文献 [6]、[14]、[15]、[42] 和 [56]。对矩阵分解发展历程最为系统的综述类文献，当属 Kline 所著《古今数学思想》（Mathematical Thought from Ancient to Modern Times）中关于行列式与矩阵的章节 [35, 第 33 章]。Mac Duffee 的著作《矩阵理论》（The Theory of Matrices）[39] 则是收录早期相关文献的宝库。

Acknowledgments. I would like to thank Anne Greenbaum, Nick Higham, David Wood, and Hongyuan Zha for reading and commenting on the manuscript.
致谢。感谢 Anne Greenbaum、Nick Higham、David Wood 以及 Hongyuan Zha（查宏远）阅读本文手稿并提出宝贵意见。

REFERENCES

参考文献

[1] L. AUTONNE, Sur les groupes linéaires, réels et orthogonaux, Bull. Soc. Math. France, 30 (1902), pp. 121-134.
[1] L. AUTONNE，《论实正交线性群》（Sur les groupes linéaires, réels et orthogonaux），《法国数学会通报》（Bull. Soc. Math. France），第 30 卷（1902 年），第 121 - 134 页。

[2] , Sur les matrices hypohermitiennes et les unitaires, Comptes Rendus de l’Académie Sciences, Paris, 156 (1913), pp. 858-860.
[2] 同作者（L. AUTONNE），《论次埃尔米特矩阵与酉矩阵》（Sur les matrices hypohermitiennes et les unitaires），《法国科学院院报》（Comptes Rendus de l’Académie Sciences, Paris），第 156 卷（1913 年），第 858 - 860 页。

[3] ., Sur les matrices hypohermitiennes et sur les matrices unitaires, Ann. Univ. Lyons, Nouvelle Série I, 38 (1915), pp. 1-77.
[3] 同作者（L. AUTONNE），《论次埃尔米特矩阵与酉矩阵》（Sur les matrices hypohermitiennes et sur les matrices unitaires），《里昂大学年报》（Ann. Univ. Lyons, Nouvelle Série I），第 38 卷（1915 年），第 1 - 77 页。

[4] H. BATEMAN, A formula for the solving function of a certain integral equation of the second kind, Trans. Cambridge Philos. Soc., 20 (1908), pp. 179-187.
[4] H. BATEMAN，《一类第二类积分方程求解函数的公式》（A formula for the solving function of a certain integral equation of the second kind），《剑桥哲学学会会刊》（Trans. Cambridge Philos. Soc.），第 20 卷（1908 年），第 179 - 187 页。

[5] E. BELTRAMI, Sulle funzioni bilineari, Giornale di Matematiche ad Uso degli Studenti Delle Università, 11 (1873), pp. 98-106. An English translation by D. Boley is available as University of Minnesota, Department of Computer Science, Minneapolis, MN, Technical Report 90-37, 1990.
[5] E. BELTRAMI，《论双线性函数》（Sulle funzioni bilineari），《大学学生用数学期刊》（Giornale di Matematiche ad Uso degli Studenti Delle Università），第 11 卷（1873 年），第 98 - 106 页。D. Boley 已将该文译为英文，可参见美国明尼苏达大学计算机科学系（明尼阿波利斯市，明尼苏达州）1990 年的技术报告 90 - 37。

[6] M. BEREKOF, Schmidt, Erhard, in Dictionary of Scientific Biography XII, C. C. Gillispe, ed., Charles Scribner’s Sons, New York, 1975.
[6] M. BEREKOF，《埃哈德·施密特》（Schmidt, Erhard），收录于 C. C. Gillispe 主编的《科学传记词典》（Dictionary of Scientific Biography）第 12 卷，查尔斯·斯克里布纳之子出版社（Charles Scribner’s Sons），纽约，1975 年。

[7] A.L. CAUCHY, Sur l’équation à l’aide de laquelle on détermine les inégalités séculaires des mouvements des planètes, in Oeuvres Complètes (II Série), Vol. 9, 1829.
[7] A.L. 柯西（CAUCHY），《论用于确定行星运动长期不等式的方程》（Sur l’équation à l’aide de laquelle on détermine les inégalités séculaires des mouvements des planètes），收录于《柯西全集》（Oeuvres Complètes）第二辑，第 9 卷，1829 年。

[8] M. CHU, A differential equation approach to the singular value decomposition of bidiagonal matrices, Linear Algebra Appl., 80 (1986), pp. 71-79.
[8] M. CHU，《用微分方程方法求解双对角矩阵的奇异值分解》（A differential equation approach to the singular value decomposition of bidiagonal matrices），《线性代数及其应用》（Linear Algebra Appl.），第 80 卷（1986 年），第 71 - 79 页。

[9] C. DAVIS AND W. KAHAN, The rotation of eigenvectors by a perturbation. III, SIAM J. Numer. Anal., 7 (1970), pp. 1-46.
[9] C. DAVIS 与 W. KAHAN，《摄动引起的特征向量旋转（第三部分）》（The rotation of eigenvectors by a perturbation. III），《美国工业与应用数学学会数值分析期刊》（SIAM J. Numer. Anal.），第 7 卷（1970 年），第 1 - 46 页。

[10] B. DE MOOR, A tree of generalizations of the ordinary singular value decomposition, Linear Algebra Appl., 147 (1991), pp. 469-500.
[10] B. DE MOOR，《普通奇异值分解的推广体系》（A tree of generalizations of the ordinary singular value decomposition），《线性代数及其应用》（Linear Algebra Appl.），第 147 卷（1991 年），第 469 - 500 页。

[11] P. DEIFT, J. DEMMEL, L.-C. LI, And C. TOMEI, The bidiagonal singular value decomposition and Hamiltonian mechanics, SIAM J. Numer. Anal., 28 (1991), pp. 1463-1516.
[11] P. DEIFT、J. DEMMEL、L.-C. LI（李）与 C. TOMEI，《双对角矩阵奇异值分解与哈密顿力学》（The bidiagonal singular value decomposition and Hamiltonian mechanics），《美国工业与应用数学学会数值分析期刊》（SIAM J. Numer. Anal.），第 28 卷（1991 年），第 1463 - 1516 页。

[12] J. DEMMEL, The smallest perturbation of a submatrix which lowers the rank and constrained total least squares problems, SIAM J. Numer. Anal., 24 (1987), pp. 199-206.
[12] J. DEMMEL，《使子矩阵秩降低的最小摄动及约束总体最小二乘问题》（The smallest perturbation of a submatrix which lowers the rank and constrained total least squares problems），《美国工业与应用数学学会数值分析期刊》（SIAM J. Numer. Anal.），第 24 卷（1987 年），第 199 - 206 页。

[13] J. DEMMEL AND W. KAHAN, Accurate singular values of bidiagonal matrices, SIAM J. Sci. Statist. Comput., 11 (1989), pp. 873-912.
[13] J. DEMMEL 与 W. KAHAN，《双对角矩阵的精确奇异值》（Accurate singular values of bidiagonal matrices），《美国工业与应用数学学会科学与统计计算期刊》（SIAM J. Sci. Statist. Comput.），第 11 卷（1989 年），第 873 - 912 页。

[14] J. DIEUDONNÉ, Jordan, Camille, in Dictionary of Scientific Biography VII, C. C. Gillispe, ed., Charles Scribner’s Sons, New York, 1973.
[14] J. DIEUDONNÉ，《卡米耶·若尔当》（Jordan, Camille），收录于 C. C. Gillispe 主编的《科学传记词典》（Dictionary of Scientific Biography）第 7 卷，查尔斯·斯克里布纳之子出版社（Charles Scribner’s Sons），纽约，1973 年。

[15] Weyl, Hermann, in Dictionary of Scientific Biography XIV, C. C. Gillispe, ed., Charles Scribner’s Sons, New York, 1976.
[15] 《赫尔曼·外尔》（Weyl, Hermann），收录于 C. C. Gillispe 主编的《科学传记词典》（Dictionary of Scientific Biography）第 14 卷，查尔斯·斯克里布纳之子出版社（Charles Scribner’s Sons），纽约，1976 年。

[16] C. ECKART AND G. YOUNG, The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211-218.
[16] C. ECKART 与 G. YOUNG，《用低秩矩阵逼近给定矩阵》（The approximation of one matrix by another of lower rank），《心理测量学》（Psychometrika），第 1 卷（1936 年），第 211 - 218 页。

[17] ., A principal axis transformation for non-Hermitian matrices, Bull. Amer. Math. Soc., 45 (1939), pp. 118-121.
[17] 同作者（C. ECKART 与 G. YOUNG），《非埃尔米特矩阵的主轴变换》（A principal axis transformation for non-Hermitian matrices），《美国数学会通报》（Bull. Amer. Math. Soc.），第 45 卷（1939 年），第 118 - 121 页。

[18] K. FAN AND A. J. HOFFMAN, Some metric inequalities in the space of matrices, Proc. Amer. Math. Soc., 6 (1955), pp. 111-116.
[18] K. FAN（樊畿）与 A. J. HOFFMAN，《矩阵空间中的若干度量不等式》（Some metric inequalities in the space of matrices），《美国数学会会议录》（Proc. Amer. Math. Soc.），第 6 卷（1955 年），第 111 - 116 页。

[19] C. F. GAUSS, Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium, Perthes and Besser, Hamburg, Germany, 1809.
[19] C. F. 高斯（GAUSS），《天体在圆锥截面上绕太阳运动的理论》（Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium），珀特斯与贝瑟出版社（Perthes and Besser），德国汉堡，1809 年。

[20] Theoria combinationis observationum erroribus minimis obnoxiae, pars posterior, in Werke, IV, Königlichen Gesellschaft der Wissenschaften zu Göttingen (1880), 1823, pp. 27-53.
[20] 《关于以最小误差组合观测值的理论（续篇）》（Theoria combinationis observationum erroribus minimis obnoxiae, pars posterior），收录于《高斯全集》（Werke）第 4 卷，哥廷根皇家科学学会（Königlichen Gesellschaft der Wissenschaften zu Göttingen），1880 年（原文发表于 1823 年），第 27 - 53 页。

[21] I. C. GOHBERG AND M. G. KREIN, Introduction to the Theory of Linear Nonselfadjoint Operators, American Mathematical Society, Providence, RI, 1969.
[21] I. C. 戈贝尔格（GOHBERG）与 M. G. 克列因（KREIN），《线性非自伴算子理论导论》（Introduction to the Theory of Linear Nonselfadjoint Operators），美国数学会（American Mathematical Society），罗得岛州普罗维登斯，1969 年。

[22] G.H. GOLUB, A. HOFFMAN, AND G. W. STEWART, A generalization ofthe Eckart-Young matrix approximation theorem, Linear Algebra Appl., 88/89 (1987), pp. 317-327.
[22] G.H. GOLUB、A. HOFFMAN 与 G. W. STEWART（斯图尔特），《埃卡特 - 杨矩阵逼近定理的推广》（A generalization of the Eckart-Young matrix approximation theorem），《线性代数及其应用》（Linear Algebra Appl.），第 88/89 卷（1987 年），第 317 - 327 页。

[23] G.H. GOLUB AND W. KAHAN, Calculating the singular values and pseudo-inverse of a matrix, SIAM J. Numer. Anal., 2 (1965), pp. 205-224.
[23] G.H. GOLUB 与 W. KAHAN，《矩阵奇异值与伪逆的计算》（Calculating the singular values and pseudo-inverse of a matrix），《美国工业与应用数学学会数值分析期刊》（SIAM J. Numer. Anal.），第 2 卷（1965 年），第 205 - 224 页。

[24] G. H. GOLUB AND C. REINSCH, Singular value decomposition and least squares solution, Numer. Math., 14 (1970), pp. 403-420; also in [66, pp.134-151].
[24] G. H. GOLUB 与 C. REINSCH，《奇异值分解与最小二乘解》（Singular value decomposition and least squares solution），《数值数学》（Numer. Math.），第 14 卷（1970 年），第 403 - 420 页；该文亦收录于文献 [66, 第 134 - 151 页]。

[25] B. F. GREEN, The orthogonal approximation of the oblique structure in factor analysis, Psychometrika, 17 (1952), pp. 429-440.
[25] B. F. GREEN，《因子分析中斜交结构的正交逼近》（The orthogonal approximation of the oblique structure in factor analysis），《心理测量学》（Psychometrika），第 17 卷（1952 年），第 429 - 440 页。

[26] R. A. HORN AND C. R. JOHNSON, Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK, 1991.
[26] R. A. HORN 与 C. R. JOHNSON，《矩阵分析专题》（Topics in Matrix Analysis），剑桥大学出版社（Cambridge University Press），英国剑桥，1991 年。

[27] H. HOTELLING, Analysis of a complex of statistical variables into principal components, J. Ed. Psych., 24 (1933), pp. 417-441 and 498-520.
[27] H. HOTELLING，《将一组复杂统计变量分解为主成分》（Analysis of a complex of statistical variables into principal components），《教育心理学杂志》（J. Ed. Psych.），第 24 卷（1933 年），第 417 - 441 页及第 498 - 520 页。

[28] Relation between two sets of variates, Biometrika, 28 (1936), pp. 322-377.
[28] 同作者（H. HOTELLING），《两组变量间的关系》（Relation between two sets of variates），《生物计量学》（Biometrika），第 28 卷（1936 年），第 322 - 377 页。

[29] J. R. HURLEY AND R. B. CATTELL, The Procrustes program: Direct rotation to test a hypothesized factor structure, Behav. Sci., 7 (1962), pp. 258-262.
[29] J. R. HURLEY 与 R. B. CATTELL，《普罗克拉斯提斯程序：用于检验假设因子结构的直接旋转法》（The Procrustes program: Direct rotation to test a hypothesized factor structure），《行为科学》（Behav. Sci.），第 7 卷（1962 年），第 258 - 262 页。

[30] C.G.J. JACOBI, Über ein leichtes Verfahren die in der Theorie der Seculärstörungen vorkommenden Gleichungen numerisch aufzulösen, J. Reine Angew. Math., 30 (1846), pp. 51-94.
[30] C.G.J. 雅可比（JACOBI），《一种求解长期摄动理论中出现的方程的简便数值方法》（Über ein leichtes Verfahren die in der Theorie der Seculärstörungen vorkommenden Gleichungen numerisch aufzulösen），《纯粹与应用数学杂志》（J. Reine Angew. Math.），第 30 卷（1846 年），第 51 - 94 页。

[31] Über eine elementare Transformation eines in Bezug jedes von zwei Variablen-Systemen linearen und homogenen Ausdrucks, J. Reine Angew. Math., 53 (1857, posthumous), pp. 265-270.
[31] 同作者（C.G.J. 雅可比），《关于对两个变量系统均为线性齐次的表达式的初等变换》（Über eine elementare Transformation eines in Bezug jedes von zwei Variablen-Systemen linearen und homogenen Ausdrucks），《纯粹与应用数学杂志》（J. Reine Angew. Math.），第 53 卷（1857 年，遗作），第 265 - 270 页。

[32] C. JORDAN, Mémoire sur les formes bilinéaires, J. Math. Pures Appl., Deuxième Série, 19 (1874), pp. 35-54.
[32] C. JORDAN（若尔当），《关于双线性形式的论文》（Mémoire sur les formes bilinéaires），《纯粹与应用数学杂志》（J. Math. Pures Appl.），第二辑，第 19 卷（1874 年），第 35 - 54 页。

[33] Sur la réduction des formes bilinéaires, Comptes Rendus de l’Académie Sciences, Paris, 78 (1874), pp. 614-617.
[33] 同作者（C. JORDAN），《论双线性形式的约化》（Sur la réduction des formes bilinéaires），《法国科学院院报》（Comptes Rendus de l’Académie Sciences, Paris），第 78 卷（1874 年），第 614 - 617 页。

[34] Essai sur la géométrie à n dimensions, Bull. Soc. Math., 3 (1875), pp. 103-174.
[34] 同作者（C. JORDAN），《n 维几何学初探》（Essai sur la géométrie à n dimensions），《数学会通报》（Bull. Soc. Math.），第 3 卷（1875 年），第 103 - 174 页。

[35] M. KLINE, Mathematical Thought from Ancient to Modern Times, Oxford University Press, New York, 1972.
[35] M. KLINE（克莱因），《古今数学思想》（Mathematical Thought from Ancient to Modern Times），牛津大学出版社（Oxford University Press），纽约，1972 年。

[36] E.G. KOGANOVICH, Solution of linear systems by diagonalization of coefficients matrix, Quart. Appl. Math., 13 (1955), pp. 123-132.
[36] E.G. KOGANOVICH，《通过系数矩阵对角化求解线性方程组》（Solution of linear systems by diagonalization of coefficients matrix），《应用数学季刊》（Quart. Appl. Math.），第 13 卷（1955 年），第 123 - 132 页。

[37] L. KRONECKER, Über bilineare Formen, Sitzungberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, (1866), pp. 597-613.
[37] L. KRONECKER（克罗内克），《论双线性形式》（Über bilineare Formen），《柏林皇家普鲁士科学院会议报告》（Sitzungberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin），1866 年，第 597 - 613 页。

[38] C. LANCZOS, Linear systems in self-adjoint form, Amer. Math. Monthly, 65 (1958), pp. 665-679.
[38] C. LANCZOS（兰佐斯），《自伴形式的线性方程组》（Linear systems in self-adjoint form），《美国数学月刊》（Amer. Math. Monthly），第 65 卷（1958 年），第 665 - 679 页。

[39] C.C. MAC DUFFEE, The Theory of Matrices, Chelsea, New York, 1946.
[39] C.C. MAC DUFFEE，《矩阵理论》（The Theory of Matrices），切尔西出版社（Chelsea），纽约，1946 年。

[40] L. MIRSKY, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math., 11 (1960), pp. 50-59.
[40] L. MIRSKY，《对称规范函数与酉不变范数》（Symmetric gauge functions and unitarily invariant norms），《数学季刊》（Quart. J. Math.），第 11 卷（1960 年），第 50 - 59 页。

[41] E.H. MOORE, On the reciprocal of the general algebraic matrix, Bull. Amer. Math. Soc., 26 (1920), pp. 394-395.
[41] E.H. MOORE（穆尔），《关于一般代数矩阵的逆》（On the reciprocal of the general algebraic matrix），《美国数学会通报》（Bull. Amer. Math. Soc.），第 26 卷（1920 年），第 394 - 395 页。

[42] J.D. NORTH, Sylvester, James Joseph, in Dictionary of Scientific Biography XIII, C. C. Gillispe, ed., Charles Scribner’s Sons, New York, 1976.
[42] J.D. NORTH，《詹姆斯·约瑟夫·西尔维斯特》（Sylvester, James Joseph），收录于 C. C. Gillispe 主编的《科学传记词典》（Dictionary of Scientific Biography）第 13 卷，查尔斯·斯克里布纳之子出版社（Charles Scribner’s Sons），纽约，1976 年。

[43] C.C. PAIGE AND M. A. SAUNDERS, Toward a generalized singular value decomposition, SIAM J. Numer. Anal., 18 (1981), pp. 398-405.
[43] C.C. PAIGE 与 M. A. SAUNDERS，《广义奇异值分解的研究》（Toward a generalized singular value decomposition），《美国工业与应用数学学会数值分析期刊》（SIAM J. Numer. Anal.），第 18 卷（1981 年），第 398 - 405 页。

[44] R. PENROSE, A generalized inverse for matrices, Proc. Cambridge Philos. Soc., 51 (1955), pp. 406-413.
[44] R. PENROSE（彭罗斯），《矩阵的广义逆》（A generalized inverse for matrices），《剑桥哲学学会会刊》（Proc. Cambridge Philos. Soc.），第 51 卷（1955 年），第 406 - 413 页。

[45] E. PICARD, Quelques remarques sur les équations intégrales de première espèce et sur certains problèmes de Physique mathématique, Comptes Rendus de l’Académie Sciences, Paris, 148 (1909), pp. 1563-1568.
[45] E. PICARD（皮卡），《关于第一类积分方程及若干数学物理问题的几点注记》（Quelques remarques sur les équations intégrales de première espèce et sur certains problèmes de Physique mathématique），《法国科学院院报》（Comptes Rendus de l’Académie Sciences, Paris），第 148 卷（1909 年），第 1563 - 1568 页。

[46] ., Sur un théorème général relatif aux équations intégrales de première espèce et sur quelques problèmes de physique mathématique, Rend. Circ. Mat. Palermo, 25 (1910), pp. 79-97.
[46] 同作者（E. PICARD），《关于第一类积分方程的一个一般定理及若干数学物理问题》（Sur un théorème général relatif aux équations intégrales de première espèce et sur quelques problèmes de physique mathématique），《帕勒莫数学通讯》（Rend. Circ. Mat. Palermo），第 25 卷（1910 年），第 79 - 97 页。

[47] C. R. RAO, Matrix approximations and reduction of dimensionality in multivariate statistical analysis, in Multivariate Analysis, V, P. R. Krishnaiah, ed., North Holland, Amsterdam, 1980.
[47] C. R. RAO（罗），《多元统计分析中的矩阵逼近与降维》（Matrix approximations and reduction of dimensionality in multivariate statistical analysis），收录于 P. R. Krishnaiah 主编的《多元分析（第五卷）》（Multivariate Analysis, V），北荷兰出版社（North Holland），阿姆斯特丹，1980 年。

[48] F. RIESZ, Über orthogonale Funktionenensystem, Göttinger Nachr., (1907), pp. 116-122. Cited in [49].
[48] F. RIESZ（里斯），《论正交函数系》（Über orthogonale Funktionenensystem），《哥廷根通讯》（Göttinger Nachr.），1907 年，第 116 - 122 页。该文献被文献 [49] 引用。

[49] F. RIESZ AND B. SZ.-NAGY, L. F. Boron, trans., Functional Analysis, Ungar, New York, 1955.
[49] F. RIESZ 与 B. SZ.-NAGY，《泛函分析》（Functional Analysis），L. F. Boron 译，昂加尔出版社（Ungar），纽约，1955 年。

[50] E. SCHMIDT, Zur Theorie der linearen und nichtlinearen Integralgleichungen. I Teil. Entwicklung willkürlichen Funktionen nach System vorgeschriebener, Math. Ann., 63 (1907), pp. 433-476.
[50] E. SCHMIDT（施密特），《线性与非线性积分方程理论（第一部分）：任意函数按给定函数系的展开》（Zur Theorie der linearen und nichtlinearen Integralgleichungen. I Teil. Entwicklung willkürlichen Funktionen nach System vorgeschriebener），《数学年刊》（Math. Ann.），第 63 卷（1907 年），第 433 - 476 页。

[51] E. H. SCHONEMAN, A generalized solution of the orthogonal Procrustes problem, Psychometrika, 31 (1966), pp. 1-10.
[51] E. H. SCHONEMAN，《正交普罗克拉斯提斯问题的广义解》（A generalized solution of the orthogonal Procrustes problem），《心理测量学》（Psychometrika），第 31 卷（1966 年），第 1 - 10 页。

[52] J. SCHUR, Über Potenzreihen, die im Innern des Einheitskreise beschränkt sind, J. Angew. Math., 147 (1917), pp. 205-232.
[52] J. SCHUR（舒尔），《论单位圆内有界的幂级数》（Über Potenzreihen, die im Innern des Einheitskreise beschränkt sind），《纯粹与应用数学杂志》（J. Angew. Math.），第 147 卷（1917 年），第 205 - 232 页。

[53] F. SMITHIES, The eigen-values and singular values of integral equations, Proc. London Math. Soc., 43 (1937), pp. 255-279.
[53] F. SMITHIES，《积分方程的特征值与奇异值》（The eigen-values and singular values of integral equations），《伦敦数学学会会议录》（Proc. London Math. Soc.），第 43 卷（1937 年），第 255 - 279 页。

[54] G.W. STEWART, On the perturbation of pseudo-inverses, projections, and linear least squares problems, SIAM Rev., 19 (1977), pp. 634-662.
[54] G.W. STEWART（斯图尔特），《伪逆、投影及线性最小二乘问题的摄动分析》（On the perturbation of pseudo-inverses, projections, and linear least squares problems），《美国工业与应用数学学会评论》（SIAM Rev.），第 19 卷（1977 年），第 634 - 662 页。

[55] G.W. STEWART AND J.-G. SUN, Matrix Perturbation Theory, Academic Press, Boston, MA, 1990.
[55] G.W. STEWART（斯图尔特）与 J.-G. SUN（孙继广），《矩阵摄动理论》（Matrix Perturbation Theory），学术出版社（Academic Press），马萨诸塞州波士顿，1990 年。

[56] D.J. STRUIK, Beltrami, Eugenio, in Dictionary of Scientific Biography I, C. C. Gillispe, ed., Charles Scribner’s Sons, New York, 1970.
[56] D.J. STRUIK，《欧金尼奥·贝尔特拉米》（Beltrami, Eugenio），收录于 C. C. Gillispe 主编的《科学传记词典》（Dictionary of Scientific Biography）第 1 卷，查尔斯·斯克里布纳之子出版社（Charles Scribner’s Sons），纽约，1970 年。

[57] J. J. SYLVESTER, A new proof that a general quadric may be reduced to its canonical form (that is, a linear function of squares) by means of a real orthogonal substitution, Messenger of Mathematics, 19 (1889), pp. 1-5.
[57] J. J. SYLVESTER（西尔维斯特），《关于一般二次型可通过实正交变换化为标准形（即平方的线性组合）的新证明》（A new proof that a general quadric may be reduced to its canonical form (that is, a linear function of squares) by means of a real orthogonal substitution），《数学通讯》（Messenger of Mathematics），第 19 卷（1889 年），第 1 - 5 页。

[58] , On the reduction of a bilinear quantic of the nth order to the form of a sum of n products by a double orthogonal substitution, Messenger of Mathematics, 19 (1889), pp. 42-46.
[58] 同作者（J. J. SYLVESTER），《关于通过双重正交变换将 n 次双线性型化为 n 个乘积和形式的研究》（On the reduction of a bilinear quantic of the nth order to the form of a sum of n products by a double orthogonal substitution），《数学通讯》（Messenger of Mathematics），第 19 卷（1889 年），第 42 - 46 页。

[59] Sur la réduction biorthogonale d’une forme lino-linaire à sa forme cannonique, Comptes Rendus de l’Académie Sciences, Paris, 108 (1889), pp. 651-653.
[59] 同作者（J. J. SYLVESTER），《论线性形式的双正交约化及其标准形》（Sur la réduction biorthogonale d’une forme lino-linaire à sa forme cannonique），《法国科学院院报》（Comptes Rendus de l’Académie Sciences, Paris），第 108 卷（1889 年），第 651 - 653 页。

[60] C.F. VAN LOAN, A general matrix eigenvalue algorithm, SIAM J. Numer. Anal., 12 (1975), pp. 819-834.
[60] C.F. VAN LOAN，《一种通用的矩阵特征值算法》（A general matrix eigenvalue algorithm），《美国工业与应用数学学会数值分析期刊》（SIAM J. Numer. Anal.），第 12 卷（1975 年），第 819 - 834 页。

[61] J. VON NEUMANN, Some matrix-inequalities and metrization of matrix-space, Tomsk. Univ. Rev., (1937), pp. 286-300.
[61] J. VON NEUMANN（冯·诺依曼），《若干矩阵不等式及矩阵空间的度量化》（Some matrix-inequalities and metrization of matrix-space），《托木斯克大学评论》（Tomsk. Univ. Rev.），1937 年，第 286 - 300 页。

[62] , Collected Works, A. H. Taub, ed., Pergamon, New York, 1962.
[62] 同作者（J. VON NEUMANN），《冯·诺依曼全集》（Collected Works），A. H. Taub 主编， Pergamon 出版社，纽约，1962 年。

[63] K. WEIERSTRASS, Zur Theorie der bilinearen und quadratischen Formen, Monatshefte Akademie Wissenschaften Berlin, (1868), pp. 310-338.
[63] K. WEIERSTRASS（魏尔斯特拉斯），《论双线性形式与二次形式的理论》（Zur Theorie der bilinearen und quadratischen Formen），《柏林科学院月刊》（Monatshefte Akademie Wissenschaften Berlin），1868 年，第 310 - 338 页。

[64] H. WEYL, Das asymptotische Verteilungsgesetz der Eigenwert linearer partieller Differentialgleichungen (mit einer Anwendung auf der Theorie der Hohlraumstrahlung), Math. Ann., 71 (1912), pp. 441-479.
[64] H. WEYL（外尔），《线性偏微分方程特征值的渐近分布律（及其在空腔辐射理论中的应用）》（Das asymptotische Verteilungsgesetz der Eigenwert linearer partieller Differentialgleichungen (mit einer Anwendung auf der Theorie der Hohlraumstrahlung)），《数学年刊》（Math. Ann.），第 71 卷（1912 年），第 441 - 479 页。

[65] Inequalities between the two kinds of eigenvalues of a linear transformation, Proc. Nat. Acad. Sci., 35 (1949), pp. 408-411.
[65] 同作者（H. WEYL），《线性变换的两类特征值之间的不等式》（Inequalities between the two kinds of eigenvalues of a linear transformation），《美国国家科学院院刊》（Proc. Nat. Acad. Sci.），第 35 卷（1949 年），第 408 - 411 页。

[66] J.H. WILKINSON AND C. REINSCH, Handbook for Automatic Computation, Vol. II Linear Algebra, Springer-Verlag, New York, 1971.
[66] J.H. WILKINSON 与 C. REINSCH，《自动计算手册（第二卷：线性代数）》（Handbook for Automatic Computation, Vol. II Linear Algebra），施普林格出版社（Springer-Verlag），纽约，1971 年。

via:

On the Early History of the Singular Value Decomposition | SIAM Review

目录

线性代数-SVD-奇异值分解的早期历史二

线性代数 · SVD | 奇异值分解的早期历史（二）

6. Weyl [64, 1912]

6. 外尔的研究 [64, 1912]

An important application of the approximation theorem is the determination of the rank of a matrix in the presence of error. If A A A is of rank k k k and A ~

A + E \tilde{A} = A + E A~=A+E, then the last n − k n - k n−k singular values of A ~ \tilde{A} A~ satisfy近似定理的一个重要应用是在存在误差的情况下确定矩阵的秩。若矩阵 A A A 的秩为 k k k，且 A ~

The location of singular values

奇异值的定位

The heart of Weyl’s development is a lemma concerning the singular values of a perturbed matrix. Specifically, if B k

X Y T B_k = XY^T Bk​=XYT, where X X X and Y Y Y have k k k columns (i.e., rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk​)≤k), then奇异值的位置。Weyl发展的关键在于一个关于扰动矩阵奇异值的引理。具体来说，如果 B k

v

of the first k + 1 k+1 k+1 columns of V V V (from the singular value decomposition of A A A) such that Y T v

0 Y^T v = 0 YTv=0. Without loss of generality we may assume that ∥ v ∥

1 |v| = 1 ∥v∥=1, or equivalently that γ 1 2 + ⋯ + γ k + 1 2

1 \gamma_1^2 + \cdots + \gamma_{k+1}^2 = 1 γ12​+⋯+γk+12​=1. It follows that使得 Y T v

0 Y^T v = 0 YTv=0。不失一般性，我们可以假设 ∥ v ∥

1 |v| = 1 ∥v∥=1，或者等价地 γ 1 2 + ⋯ + γ k + 1 2

σ 1 2 ( A − B ) ≥ v T ( A − B ) T ( A − B ) v

v T A T A v

Weyl then proves two theorems. The first states that if A

A ′ + A ′ ′ A = A’ + A’’ A=A′+A′′, then随后，外尔证明了两个定理。第一个定理指出，若 A

j

1 i = j = 1 i=j=1:其中， σ i ( A ′ ) \sigma_{i}(A’) σi​(A′) 和 σ i ( A ′ ′ ) \sigma_{i}(A’’) σi​(A′′) 分别表示矩阵 A ′ A’ A′ 和 A ′ ′ A’’ A′′ 按从大到小顺序排列的第 i i i 个奇异值。外尔首先证明了 i

j

σ 1 ( A )

u 1 T A v 1

To establish the result in general, let A i − 1 ′

∑ m

1 i − 1 σ m ( A ′ ) u m ′ v m ′ T A_{i-1}’ = \sum_{m=1}^{i-1} \sigma_{m}(A’) u_m’ v_m’^T Ai−1′​=∑m=1i−1​σm​(A′)um′​vm′T​ and A j − 1 ′ ′

∑ m

1 j − 1 σ m ( A ′ ′ ) u m ′ ′ v m ′ ′ T A_{j-1}’’ = \sum_{m=1}^{j-1} \sigma_{m}(A’’) u_m’’ v_m’’^T Aj−1′′​=∑m=1j−1​σm​(A′′)um′′​vm′′T​ be formed in analogy with (5.2). Then σ 1 ( A ′ − A i − 1 ′ )

σ i ( A ′ ) \sigma_{1}(A’ - A_{i-1}’) = \sigma_{i}(A’) σ1​(A′−Ai−1′​)=σi​(A′) and σ 1 ( A ′ ′ − A j − 1 ′ ′ )

σ j ( A ′ ′ ) \sigma_{1}(A’’ - A_{j-1}’’) = \sigma_{j}(A’’) σ1​(A′′−Aj−1′′​)=σj​(A′′). Moreover, rank ( A i − 1 ′ + A j − 1 ′ ′ ) ≤ ( i − 1 ) + ( j − 1 )

i + j − 2 (A_{i-1}’ + A_{j-1}’’) \leq (i-1) + (j-1) = i+j-2 (Ai−1′​+Aj−1′′​)≤(i−1)+(j−1)=i+j−2. From these facts and from (6.2) it follows that为证明该定理在一般情况下成立，参照式 (5.2) 构造矩阵： A i − 1 ′

∑ m

1 i − 1 σ m ( A ′ ) u m ′ v m ′ T A_{i-1}’ = \sum_{m=1}^{i-1} \sigma_{m}(A’) u_m’ v_m’^T Ai−1′​=∑m=1i−1​σm​(A′)um′​vm′T​， A j − 1 ′ ′

∑ m

1 j − 1 σ m ( A ′ ′ ) u m ′ ′ v m ′ ′ T A_{j-1}’’ = \sum_{m=1}^{j-1} \sigma_{m}(A’’) u_m’’ v_m’’^T Aj−1′′​=∑m=1j−1​σm​(A′′)um′′​vm′′T​。则有 σ 1 ( A ′ − A i − 1 ′ )

σ i ( A ′ ) \sigma_{1}(A’ - A_{i-1}’) = \sigma_{i}(A’) σ1​(A′−Ai−1′​)=σi​(A′)， σ 1 ( A ′ ′ − A j − 1 ′ ′ )

σ j ( A ′ ′ ) \sigma_{1}(A’’ - A_{j-1}’’) = \sigma_{j}(A’’) σ1​(A′′−Aj−1′′​)=σj​(A′′)，且 rank ( A i − 1 ′ + A j − 1 ′ ′ ) ≤ ( i − 1 ) + ( j − 1 )