The number one reason why orthogonality is essential in data science: uncorrelating features.

However, the features we are given are rarely such. There are ways to fix this, and the Gram-Schmidt process is one of them.

Here is how it works.

**Before you read on:**

This week, I was featured in the DSBoost newsletter, where I am talking about how I started writing about math online, or whether or not you need mathematics to get started in machine learning. (Spoiler alert: you don’t.) Check it out!

If you find value in this newsletter, consider supporting me with a paid subscription or joining the early access for my Mathematics of Machine Learning book. Your support enables me to keep bringing quality math education for everyone!

The problem is simple. We are given a set of basis vectors **a₁**,** a₂**, …, **aₙ,** and we want to turn them into an orthogonal basis **q₁**,** q₂**, …, **qₙ** such that each **qᵢ**-s represent the same information as **aᵢ**.

(A natural way to formalize our second condition is to require **q₁**,** q₂**, …, **qₖ** to span the same subspace as **a₁**,** a₂**, …, **aₖ**.)

How do we achieve such a result? One step at a time.

Let’s look at an example! Our input consists of three highly correlated but still independent three-dimensional vectors.

If we want **q₁** to represent the same information as **a₁**, the choice **q₁ =** **a₁ **will work perfectly. Nothing exciting so far.

Now, we need to pick **q₂** such that

**q₂**is orthogonal to**q₁,**and together with

**q₁**, they “represent the same information” as**a₁**and**a₂**. (That is, they generate the same subspace.)

The natural choice is to take the difference of **a₂ **and its orthogonal projection onto **q₁**.

If you are not convinced that **q₂ **is orthogonal to** q₁**, here is a quick calculation to sort it out.

If you are a visual person, here is what happens.

This projection step is the essence of Gram-Schmidt, and the entire algorithm is just its iteration.

In other words, we follow by subtracting the orthogonal projection of **a₃** onto **q₂ **and** q₁ **from** a₃** itself.

Here are the input and the output.

Overall, here is the output in the general case.

As you can see, the Gram-Schmidt process is much simpler than the intimidating formulas suggest.

From a geometric viewpoint, the Gram-Schmidt orthogonalization process constructs a set of orthogonal vectors that span the same subspace as the original vectors by iteratively subtracting the projections of previous vectors onto the current vector.

This explanation is also a part of my Mathematics of Machine Learning book, where I explain all the mathematical concepts like your teachers should have, but probably never did.

## Epsilons, no. 4: The Gram-Schmidt process

Nice post by the way, I’d comment that the axes (the angles) look a little bit strange to represent the 3d plane in some moment I thought wait why 3D vectors if the draw are in 2D space? Then I saw again and looked third dimension.

There is a mistake in the plot called final step, it’s the orthogonal projection of a3. The formulas are ok, but not the text