The number one reason why orthogonality is essential in data science: uncorrelating features.
However, the features we are given are rarely such. There are ways to fix this, and the Gram-Schmidt process is one of them.
Here is how it works.
Before you read on:
This week, I was featured in the DSBoost newsletter, where I am talking about how I started writing about math online, or whether or not you need mathematics to get started in machine learning. (Spoiler alert: you don’t.) Check it out!
If you find value in this newsletter, consider supporting me with a paid subscription or joining the early access for my Mathematics of Machine Learning book. Your support enables me to keep bringing quality math education for everyone!
The problem is simple. We are given a set of basis vectors a₁, a₂, …, aₙ, and we want to turn them into an orthogonal basis q₁, q₂, …, qₙ such that each qᵢ-s represent the same information as aᵢ.
(A natural way to formalize our second condition is to require q₁, q₂, …, qₖ to span the same subspace as a₁, a₂, …, aₖ.)
How do we achieve such a result? One step at a time.
Let’s look at an example! Our input consists of three highly correlated but still independent three-dimensional vectors.
If we want q₁ to represent the same information as a₁, the choice q₁ = a₁ will work perfectly. Nothing exciting so far.
Now, we need to pick q₂ such that
q₂ is orthogonal to q₁,
and together with q₁, they “represent the same information” as a₁ and a₂. (That is, they generate the same subspace.)
The natural choice is to take the difference of a₂ and its orthogonal projection onto q₁.
If you are not convinced that q₂ is orthogonal to q₁, here is a quick calculation to sort it out.
If you are a visual person, here is what happens.
This projection step is the essence of Gram-Schmidt, and the entire algorithm is just its iteration.
In other words, we follow by subtracting the orthogonal projection of a₃ onto q₂ and q₁ from a₃ itself.
Here are the input and the output.
Overall, here is the output in the general case.
As you can see, the Gram-Schmidt process is much simpler than the intimidating formulas suggest.
From a geometric viewpoint, the Gram-Schmidt orthogonalization process constructs a set of orthogonal vectors that span the same subspace as the original vectors by iteratively subtracting the projections of previous vectors onto the current vector.
This explanation is also a part of my Mathematics of Machine Learning book, where I explain all the mathematical concepts like your teachers should have, but probably never did.
Epsilons, no. 4: The Gram-Schmidt process
Nice post by the way, I’d comment that the axes (the angles) look a little bit strange to represent the 3d plane in some moment I thought wait why 3D vectors if the draw are in 2D space? Then I saw again and looked third dimension.
There is a mistake in the plot called final step, it’s the orthogonal projection of a3. The formulas are ok, but not the text