Singular Value Decomposition

For a square matrix with linearly independent eigenvectors,

\[A = PDP^{-1}\]

Where \(P\) is a matrix whose columns are the eigenvectors of \(A\) and \(D\) is a diagonal matrix with the eigenvalues of \(A\) along the diagonal.

Some matrices are not just square, square, but also symmetric, e.g.

\[\begin{bmatrix} 6 & -2 & -1 \\ -2 & 6 & -1 \\ -1 & -1 & 5 \end{bmatrix}\]

The eigenvalues and eigenvectors are [8,6,3], their corresponding eigenvectors are [-1,1,0], [-1/2, -1/2, 1], [1,1,1]. These eigenvectors are special. They are linearly independent and they are pairwise orthogonal.

For SVD, we need they are orthonormal, meaning their l2 norm should be 1. So we divide each eigenvector by its l2 norm and got \([[-1/\sqrt{2}, -1/\sqrt{2}, 0], [-1/\sqrt{6}, -1/\sqrt{6}, \sqrt{2}/\sqrt{3}], [1/\sqrt{3}, 1/\sqrt{3}, 1/\sqrt{3}]]\)

Note that if your eigenvalues are repeated, you’ll have to use Gram-Schmidt to create an orthonormal basis.

Here’s something nice about these square symmetric matrices and their orthonormalized eigenvectors: When you diagonalize: \(A = PDP^{-1}\), you will notice that \(P^{-1} = P^T\).

The fact that \(P^{-1} = P^T\) in the case of orthonormalized eigenvectors of a symmetric matrix is really convenient for diagonalization. To diagnonalize, just:

find eigenvalues and eigenvectors
divide eigenvectors by their l2 norm
put the eigenvalues into D and the eigenvectors into \(P\)
Write \(P^T\)
Diagonalize: \(A = PDP^T\)

Remember: put the eigenvalues in descending order and make sure to keep their corresponding eigenvectors in the right order if you move them!

If \(A\) isn’t square, \(AA^T\) and \(A^TA\) are square and symmetric.

We like \(AA^T\) and \(A^TA\) because they’re square and symmetric and we know how they related to our original matrix A.

Let’s orthonormally diagonalize \(AA^T\) and \(A^TA\). Let

\[V = A^T A,\ U = A A^T\] \[A^T A = VD V^T,\ A A^T = UD U^T\]

>>> def diagonalize(A):
...     eigenvalues, eigenvectors = np.linalg.eig(A)
...     order = np.argsort(eigenvalues)[::-1]
...     eigenvalues = eigenvalues[order]
...     D = np.diag(eigenvalues)
...     return D, eigenvectors[order]
... 
>>> A = np.array([[4, 11, 14],
...               [8, 7, -2]])
>>> ATA = A.T @ A
>>> D, V = diagonalize(ATA)
>>> ATA
array([[ 80, 100,  40],
       [100, 170, 140],
       [ 40, 140, 200]])
>>> np.allclose(V @ D @ V.T, ATA)
True
>>> AAT = A @ A.T
>>> D, U = diagonalize(AAT)
>>> AAT
array([[333,  81],
       [ 81, 117]])
>>> np.allclose(U @ D @ U.T, AAT)
True

We want to find eigenvalues and vectors because they allow us to learn things about a matrix that’s big and troublesome by just dealing with scalars and vectors.

If we can write \(A\) as a combination of scalars and vectors. For non square matrix, we use \(AA^T\) and \(A^TA\) and broke them into scalars and vectors. How do we turn \(AA^T\) and \(A^TA\) into A?

We can prove that if

\[A^TA = VDV^T,\ AA^T = UDU^T\]

then \(A = U \Sigma V^T\)

The matrix \(\Sigma\) is the element wise square root of matrix \(D\).

These values \(\sqrt{\lambda_i}\) are called the singular values of \(A\).

Regarding the dimension of \(\Sigma\), suppose \(A: [m\times n], A^T:[n\times m], A^TA: [n \times n], AA^T: [m \times m]\), so \(A: [m \times n] = [m\times m] \Sigma [n \times n]\)

In order to multiply these and get an \([m \times n]\), \(\Sigma\) need to be \([m \times n]\), the same dimension as \(A\). If \(A: [2 \times 3]\), then

\[\Sigma = \begin{bmatrix} \sqrt{\lambda_1} & 0 & 0 \\ 0 & \sqrt{\lambda_2} & 0 \end{bmatrix}\]

Different than diagnolization, SVD is going to help us pull really important information out out \(A\). How? Something called “Spectral Decomposition”. We are going to rewrite \(A = U \Sigma V^T\) as a big linear combination, a decomposition and see how spectacularly useful its terms are.

\[\begin{aligned} A &= U\Sigma V^T \\ &= \begin{bmatrix} \vdots & \vdots & \vdots \\ u_1 & \cdots &u_n \\ \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \sqrt{\lambda_1} & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & \sqrt{\lambda_n} \end{bmatrix} \begin{bmatrix} \cdots & v_1 & \cdots \\ \cdots & \vdots & \cdots \\ \cdots & v_n & \cdots \end{bmatrix} \\ &= u_1 \sqrt{\lambda_1} {v_1}^T + \cdots + u_n \sqrt{\lambda_n} {v_n}^T \end{aligned}\]

Why \(A = u_1 \sqrt{\lambda_1} {v_1}^T + \cdots + u_n \sqrt{\lambda_n} {v_n}^T\) is desirable? Because each of this spectral decomposition pull a specific significant piece of information out of \(A\). The first term, \(u_1 \sqrt{\lambda_1} {v_1}^T\) is called the first order approximation of \(A\) and is special because it is the most important information. The “essense” of A if you will. Similar to Taylor Series, the addition of each subsequent term brings you closer to \(A\).

Example using np.linagl.svd to decompose \(A\):

>>> A
array([[ 4, 11, 14],
       [ 8,  7, -2]])
>>> m, n = A.shape
>>> U, s, VT = np.linalg.svd(A)
>>> Sigma = np.zeros((m, n))
>>> Sigma[:m,:m] = np.diag(s)
>>> np.allclose(U @ Sigma @ VT, A)
True

For more than 2D,

>>> B = np.random.randint(10,size=(4,6,3))
>>> m,n,d = B.shape
>>> u, s, vh = np.linalg.svd(B, full_matrices=True)
>>> np.allclose(B, np.matmul(u[...,:d] * s[..., np.newaxis, :], vh))
True
>>> np.allclose(B, np.matmul(u[...,:d], s[..., np.newaxis] * vh))
True

To use SVD for image compression, check example at here.

Reference

SVD How To