Epsilons, no. 4: The Gram-Schmidt process
Orthogonalizing bases since 1883
The number one reason why orthogonality is essential in data science: uncorrelating features.
However, the features we are given are rarely such. There are ways to fix this, and the Gram-Schmidt process is one of them.
Here is how it works.
Before you read on:
This week, I was featured in the DSBoost newsletter, where I am talking about how I started writing about math online, or whether or not you need mathematics to get started in machine learning. (Spoiler alert: you don’t.) Check it out!
If you find value in this newsletter, consider supporting me with a paid subscription or joining the early access for my Mathematics of Machine Learning book. Your support enables me to keep bringing quality math education for everyone!
The Palindrome is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
The problem is simple. We are given a set of basis vectors a₁, a₂, …, aₙ, and we want to turn them into an orthogonal basis q₁, q₂, …, qₙ such that each qᵢ-s represent the same information as aᵢ.
(A natural way to formalize our second condition is to require q₁, q₂, …, qₖ to span the same subspace as a₁, a₂, …, aₖ.)
How do we achieve such a result? One step at a time.
Let’s look at an example! Our input consists of three highly correlated but still independent three-dimensional vectors.
If we want q₁ to represent the same information as a₁, the choice q₁ = a₁ will work perfectly. Nothing exciting so far.
Now, we need to pick q₂ such that
q₂ is orthogonal to q₁,
and together with q₁, they “represent the same information” as a₁ and a₂. (That is, they generate the same subspace.)
The natural choice is to take the difference of a₂ and its orthogonal projection onto q₁.
If you are not convinced that q₂ is orthogonal to q₁, here is a quick calculation to sort it out.
If you are a visual person, here is what happens.
This projection step is the essence of Gram-Schmidt, and the entire algorithm is just its iteration.
In other words, we follow by subtracting the orthogonal projection of a₃ onto q₂ and q₁ from a₃ itself.
Here are the input and the output.
Overall, here is the output in the general case.
As you can see, the Gram-Schmidt process is much simpler than the intimidating formulas suggest.
From a geometric viewpoint, the Gram-Schmidt orthogonalization process constructs a set of orthogonal vectors that span the same subspace as the original vectors by iteratively subtracting the projections of previous vectors onto the current vector.
This explanation is also a part of my Mathematics of Machine Learning book, where I explain all the mathematical concepts like your teachers should have, but probably never did.