Singular Value Decomposition

Table of Contents

1. Singular Value Decomposition

We would like to be able to write any matrix \(A\) as a product of simpler matrices (e.g. not just square invertible matrices with diagonalization). To make this possible for every \(m \times n\) matrix we need one set of orthonormal vectors \(\mathbf{v}_1, \dots, \mathbf{v}_n\) in \(\mathbb{R}^n\) and a second set of orthonormal vectors \(\mathbf{u}_1, \dots, \mathbf{u}_m\) in \(\mathbb{R}^m\). Instead of \(A\mathbf{x} = \lambda\mathbf{x}\), we want \(A\mathbf{v} = \sigma\mathbf{u}\) for some \(\sigma\), the singular values.

We can encode all the vectors \(\mathbf{v}_i\) for \(1 \leq i \leq n\) into the columns of a matrix \(V\), and all the products \(\sigma_j\mathbf{u}_j\) for \(1 \leq j \leq m\) as the product of the matrices \(U\Sigma\), where \(\Sigma\) is a diagonal matrix with the singular values on the diagonal. We see this works because:

\begin{align} \begin{bmatrix} \: \\ \mathbf{u}_1 & \cdots & \mathbf{u}_m \\ \: \end{bmatrix} \begin{bmatrix} \sigma_1 & \cdots & 0 \\ \vdots & \ddots \\ 0 & & \sigma_m \end{bmatrix} = \begin{bmatrix} \: \\ \sigma_1\mathbf{u}_1 & \cdots & \sigma_m\mathbf{u}_m \\ \: \end{bmatrix} \notag \end{align}

We can now represent the relationship as \(AV = U\Sigma\). Critically, \(V\) and \(U\) are orthogonal matrices, since they were formed from orthonormal vectors. Therefore, by multiplying \(V^T\) on the right, we get the singular value decomposition for \(A\):

\begin{align} \boxed{A = U \Sigma V^T} \end{align}

where \(U\) is an \(m \times m\) orthogonal matrix formed from the \(\mathbf{u}\) vectors, \(\Sigma\) is an \(m \times n\) "diagonal" matrix with \(\sigma\) on the diagonal, and \(V\) is an \(n \times n\) orthogonal matrix formed from the \(\mathbf{v}\) vectors.

1.1. Proof

(1) tells us everything we need to find the SVD, except for how to obtain the \(\mathbf{u}\) and \(\mathbf{v}\) vectors. One way to do this is to form the symmetric matrices \(A^TA\) and \(AA^T\):

\begin{align} A^TA &= (V\Sigma^TU^T)(U\Sigma V^T) = V\Sigma^T\Sigma V^T \notag \\ AA^T &= (U\Sigma V^T)(V\Sigma^TU^T) = U\Sigma\Sigma^TU^T \notag \end{align}

Notice that both are in the orthogonally diagonalized form \(Q\Lambda Q^T\). Thus, we see that \(V\) is simply the eigenvectors of \(A^TA\), and \(U\) is the eigenvectors of \(AA^T\). Additionally, since \(\Lambda = \Sigma^T\Sigma = \Sigma\Sigma^T\), \(\sigma^2\) is just the nonzero eigenvalues of both \(A^TA\) and \(AA^T\).

However, since the SVD requires that \(A\mathbf{v}_k = \sigma_k\mathbf{u}_k\), we are not quite finished — the specific \(\mathbf{u}\) vectors to use depend on what we choose for the \(\mathbf{v}\) vectors. For example, if \(\lambda\) is a double eigenvalue, there is a whole plane of eigenvectors to choose from; when I choose two specific \(\mathbf{v}\) vectors from that plane, then \(A\mathbf{v} = \sigma\mathbf{u}\) will tell me the corresponding \(\mathbf{u}\) vectors.

To do this, we shall start with the \(\mathbf{v}\) vectors. Choose orthonormal eigenvectors \(\mathbf{v}_1, \dots, \mathbf{v}_r\) of \(A^TA\) corresponding to the \(r\) positive eigenvalues. Then, since each \(\sigma\) is the square of the eigenvalues of \(A^TA\), choose \(\sigma_i = \sqrt{\lambda_i}\). Since we want \(A\mathbf{v} = \sigma\mathbf{u}\), we divide by \(\sigma\) to get:

\begin{align} \mathbf{u}_i = \frac{A\mathbf{v}_i}{\sigma_i}, \: 1 \leq i \leq r \end{align}

Then \(\mathbf{u}_1, \dots, \mathbf{u}_r\) is an orthonormal set in \(\mathbb{R}^m\). In order to get all \(m\) of our \(\mathbf{u}\) vectors, we must extend this set to become an orthonormal basis for \(\mathbb{R}^m\). This is done by thinking of \(\mathbf{u}_1, \dots, \mathbf{u}_r\) as generating a subspace \(W\) of \(\mathbb{R}^m\). Then, we can find the orthonormal basis for \(W^{\perp}\): the nullspace of a matrix with rows \(\mathbf{u}_1, \dots, \mathbf{u}_r\). We can use Gram-Schmidt to make it orthonormal. We now have all the necessary components for the SVD.

Example: SVD

We want to find the SVD for:

\begin{align} A = \begin{bmatrix} 3 & -3 \\ 0 & 0 \\ 1 & 1 \end{bmatrix} \notag \end{align}

First, to find the \(\mathbf{u}\) vectors and the \(\mathbf{v}\) vectors, we calculate \(A^TA\):

\begin{align} A^TA = \begin{bmatrix} 3 & 0 & 1 \\ -3 & 0 & 1 \end{bmatrix} \begin{bmatrix} 3 & -3 \\ 0 & 0 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 10 & -8 \\ -8 & 10 \end{bmatrix} \notag \end{align}

It can be seen that the eigenvalues and eigenvectors for this are:

\begin{align} \lambda &= 18, \: \sigma_1 = 3\sqrt{2}, \: \mathbf{v}_1 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \notag \\ \lambda &= 2, \: \sigma_2 = \sqrt{2}, \: \mathbf{v}_2 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \notag \end{align}

Now, we can find the \(\mathbf{u}\) vectors using (2):

\begin{align} \mathbf{u}_1 &= \frac{A\mathbf{v}_1}{\sigma_1} = \begin{bmatrix} -1 \\ 0 \\ 0 \end{bmatrix} \notag \\ \mathbf{u}_2 &= \frac{A\mathbf{v}_2}{\sigma_2} = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} \notag \end{align}

Since \(A\) is a \(3 \times 2\) matrix, but we only have two \(\mathbf{u}\) vectors, we need to complete the orthogonal set. Here, it can be seen that \(\mathbf{u}_3 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\). Then:

\begin{align} \boxed{U = \begin{bmatrix} -1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}, \: \Sigma = \begin{bmatrix} 3\sqrt{2} & 0 \\ 0 & \sqrt{2} \\ 0 & 0 \end{bmatrix}, \: V = \begin{bmatrix} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}} \notag \end{align}
Last modified: 2025-11-03 09:23