39 Comments
User's avatar
sciencetalks's avatar

No amount of words can describe how grateful I am for you to have compiled a stunning resource such as this one. Thank you very much for all of your time and effort! Will definitely be sharing with friends. Can't wait to read more, subscribed. :))

Expand full comment
Rocco Jarman's avatar

Outstanding resource. Thanks!

Expand full comment
Sani Nassif's avatar

Small typo in figure. c^2 = a^2 + b^2, not the square root of ...

Expand full comment
Tivadar Danka's avatar

Thanks, you are correct!

Expand full comment
Adriatic's avatar

This article is written extremely well - a very difficult task, as it represents complex math constructs using everyday’s english language. This approach is current methodology in writing books on state of the art quantum field theory.

Expand full comment
Kovats William's avatar

Is there a typo in your formula:

<ax+y,z> =a<x,z> + <x,y> = <y,z> ?

I would have thought <ax+y,z>= a<x,z> + <y,z>

but I might just not be understanding something.

Expand full comment
Alberto Sasso's avatar

I do agree, that should describe the linearity feature.

Expand full comment
Claude Shannon's avatar

This is correct. For vectors x, y, and z and a scalar a, <ax+y, z> = a<x,z> + <y,z>. You can prove it for the dot product in the plane by writing out the vectors in terms of their components: x = (x_1, x_2), y = (y_1, y_2), z = (z_1, z_2) and doing the calculations.

Expand full comment
Vladimir Shitov's avatar

Thank you, that’s a wonderful article! Small typo:

> Take a small step in the direction of the gradient to arrive at the point x₁. (The step size is called the learning rate.)

The steps should be taken in the opposite direction :)

Expand full comment
Tivadar Danka's avatar

Thanks, I'll fix this ASAP!

Expand full comment
Vladimir Shitov's avatar

It’s actually pretty fun to follow the gradient once and see how the loss diverges.

Expand full comment
Saurabh Dalvi's avatar

Much needed thanks

Expand full comment
Gwyndar's avatar

I'm going to go through this to see if I can relearn what I lost from a TBI. I was in a car accident during my initial calculus course. I loved differentials so much for it's ease of use that I often wondered why it wasn't taught first. The week we started integrals was the accident. I couldn't learn them and lost the differentials as well. I can't remember how to find first derivative and have failed when trying to relearn it on my own but I'm going to see if this can unlock some of it.

Expand full comment
marshall murphy's avatar

Well now

Expand full comment
Srinivasan's avatar

What about statistics?

Expand full comment
Claude Shannon's avatar

He lists statistics topics under probability though, like distributions and expected value. I'd change that section to "probability and statistics" and add the Central Limit Theorem, descriptive and inferential statistics, and hypothesis testing.

Expand full comment
Neha Vari's avatar

A small correction, if you could do in Pythagoras Theorem diagram, it should be c2=a2+b2....

Expand full comment
gr1.61803's avatar

This is good.

Expand full comment
Luis Rodriguez's avatar

🙌🏻🙌🏻🙌🏻

Expand full comment
Will Smith's avatar

Thank-you very much. This was helpful to me, to help me understand how our system works.

Expand full comment
viktor's avatar

Cool stuff! I think there's a typo in the `l_i(x)` equation where the last term should be `b_i`, not `b_1`?

Expand full comment
John Quiggin's avatar

This gets you to the starting line, though I think you could skip a fair bit if you were only concerned with "machine learning" as it's usually defined.

The comprehensible part of machine learning is linear discriminant analysis, picking a set of variables which can, with appropriate weights, classify inputs with a bunch of characteristics into different classes - in the archetypal example, plants into species.

Neural networks and then LLMs do the same thing, but with vastly increased data sets and replacing the linear disciminant with a multi-level non-linear version, including inferred hidden variables. This produces much more impressive results, but from a "black box" no one can really understand.

Expand full comment