Matrix factorizations are the pinnacle results of linear algebra.
Factorizations enable both theoretical and numerical results, ranging from inverting matrices to dimensionality reduction of feature-rich datasets. Check out my Linear Algebra for Machine Learning book if you don’t believe me.
In this mini-series within the Epsilon series, we’ll take a quick look at four of the most important matrix factorization methods:
the LU decomposition,
the QR decomposition,
the spectral theorem,
and the Singular Value Decomposition.
Let’s start with the first one: the LU decomposition, that is, the factorization of a matrix into the product of an upper and lower triangular one.
Why is such a decomposition useful? There are two main applications:
computing determinants,
and inverting matrices.
For instance, check out how the LU decomposition simplifies the determinant. (As the determinant of an upper or lower triangular matrix is the product of the diagonal elements.)
We’ll demonstrate the technique in the 3 x 3 case. I encourage you to pick up a pen and paper and carry out the calculations by yourself. Trust me, it’ll be much easier to follow.
Let’s go back to square one: where do matrices come from?
For one, systems of linear equations. They are used to model various phenomena ranging from economic processes to biological systems.
Collecting the coefficients into a matrix, we can write the above system in a compact way.
How do we solve systems of linear equations? By transforming them.
More specifically, there are two operations that we can perform without changing the set of solutions: 1) multiplying equations with a scalar and 2) adding equations together.
Systems of linear equations are solved by transforming them into the so-called echelon form, that is, the “leading entry (that is the left-most nonzero entry) of every nonzero row is to the right of the leading entry of every row above”. (If it is possible.)
In the 3 x 3 case, it is done in two steps.
First, we multiply the first row by a₂₁/a₁₁, then subtract it from the second row; then again multiply the first row by a₃₁/a₁₁, then subtract it from the third row. It’ll zero out the first column below the first element.
(Here comes the part where you are encouraged to manually carry out the calculations by yourself.)
The second step is similar. (I’ll leave you to figure out the details.)
These operations can be translated to the language of matrices: multiplying the coefficient matrix with well-crafted matrices will yield the echelon form just the same. The first step is below.
The second step can be done similarly, yielding the echelon form.
The two transformation matrices can be inverted and brought to the right.
This is (almost) the LU decomposition. By joining the two inverse transformation matrices together, the LU decomposition is obtained.
Here is it in its full glory.
As triangular matrices are easily inverted, this is usually the first step of inverting matrices.
One final caveat. The LU decomposition is not always possible. Remember how we obtained the echelon form? It can happen that during one step, we are forced to divide by zero, thus the process breaks down.
Fortunately, this is rarely the case. When it works, the LU decomposition is an extremely useful technique.
Without a doubt, linear algebra is one of the most important mathematical tools for a machine learning practitioner.
If you want to master it like a pro, check out my upcoming Mathematics of Machine Learning book. I release the chapters as I finish them, and all the linear algebra ones are done and published. (The linear algebra chapters are also available as a stand-alone book.)
Understanding mathematics is a superpower, and I want to help you unlock it.