Epsilons, no. 2: Understanding matrix multiplication
The geometric and algebraic interpretation of matrix products
Matrix multiplication is not easy to understand.
Even looking at the definition used to make me sweat, let alone trying to comprehend the pattern. Yet, there is a stunningly simple explanation behind it.
Let's pull back the curtain!
Understanding mathematics is a superpower. Subscribing to The Palindrome will instantly unlock it for you. For sure. (Or at least help you get there, step by step.)
First, some notations.
The product of A and B is given by the following formula. Not the easiest (or most pleasant) to look at.
The situation is worse if we drop the simpler notation and fully write out AB.
A quick visualization before the technical details. The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.
Let’s unwrap matrix multiplication!
We start with a special case: multiplying A with a (column) vector whose first component is 1, and the rest are 0. Let's name this special vector e₁.
Turns out, the product of A and e₁ is the first column of A.
Similarly, multiplying A with e₂ — that is, a (column) vector whose second component is 1 and the rest are 0 — yields the second column of A.
That's a pattern!
By the same logic, we conclude that A times eₖ equals the k-th column of A.
This sounds a bit algebra-y, so let's see this idea in geometric terms.
Matrices represent linear transformations. You know, those that stretch, skew, rotate, flip, or otherwise linearly distort the space.
The images of basis vectors form the columns of the matrix. We can visualize this in two dimensions.
Thus, the columns of A can be also thought of as the images of eₖ under A.
Moreover, we can look at a matrix-vector product as a linear combination of the column vectors. Make a mental note of this, because it is important.
(If unwrapping the matrix-vector product seems too complex, I got you. The computation below is the same as in the above tweet, only in vectorized form.)
Now, about the matrix product formula.
From a geometric perspective, AB is the same as first applying B, then A to our underlying space.
Now, let’s study ABe₁, that is, the first column of AB.
Recall that matrix-vector products are linear combinations of column vectors. Thus, the first column of AB is the linear combination of A's columns. (With coefficients from the first column of B.)
We can collapse the linear combination into a single vector, resulting in a formula for the first column of AB. This is straight from the mysterious definition of the matrix product.
The same logic can be applied, thus giving an explicit formula to calculate the elements of a matrix product. And, we are done!
Linear algebra is powerful exactly because it abstracts away the complexity of manipulating data structures like vectors and matrices.
Instead of explicitly dealing with arrays and convoluted sums, we can use simple expressions AB.
That's a huge deal.
Without a doubt, linear algebra is one of the most important mathematical tools for a machine learning practitioner.
If you want to master it like a pro, check out my upcoming Mathematics of Machine Learning book. I release the chapters as I finish them, and all the linear algebra ones are done and published. (The linear algebra chapters are also available as a stand-alone book.)
Understanding mathematics is a superpower, and I want to help you unlock it.