44 Comments
User's avatar
sciencetalks's avatar

No amount of words can describe how grateful I am for you to have compiled a stunning resource such as this one. Thank you very much for all of your time and effort! Will definitely be sharing with friends. Can't wait to read more, subscribed. :))

Rocco Jarman's avatar

Outstanding resource. Thanks!

Sani Nassif's avatar

Small typo in figure. c^2 = a^2 + b^2, not the square root of ...

Tivadar Danka's avatar

Thanks, you are correct!

Adriatic's avatar

This article is written extremely well - a very difficult task, as it represents complex math constructs using everyday’s english language. This approach is current methodology in writing books on state of the art quantum field theory.

Kovats William's avatar

Is there a typo in your formula:

<ax+y,z> =a<x,z> + <x,y> = <y,z> ?

I would have thought <ax+y,z>= a<x,z> + <y,z>

but I might just not be understanding something.

Alberto Sasso's avatar

I do agree, that should describe the linearity feature.

Claude Shannon's avatar

This is correct. For vectors x, y, and z and a scalar a, <ax+y, z> = a<x,z> + <y,z>. You can prove it for the dot product in the plane by writing out the vectors in terms of their components: x = (x_1, x_2), y = (y_1, y_2), z = (z_1, z_2) and doing the calculations.

Vladimir Shitov's avatar

Thank you, that’s a wonderful article! Small typo:

> Take a small step in the direction of the gradient to arrive at the point x₁. (The step size is called the learning rate.)

The steps should be taken in the opposite direction :)

Tivadar Danka's avatar

Thanks, I'll fix this ASAP!

Vladimir Shitov's avatar

It’s actually pretty fun to follow the gradient once and see how the loss diverges.

Saurabh Dalvi's avatar

Much needed thanks

viktor's avatar

Cool stuff! I think there's a typo in the `l_i(x)` equation where the last term should be `b_i`, not `b_1`?

Gwyndar's avatar

I'm going to go through this to see if I can relearn what I lost from a TBI. I was in a car accident during my initial calculus course. I loved differentials so much for it's ease of use that I often wondered why it wasn't taught first. The week we started integrals was the accident. I couldn't learn them and lost the differentials as well. I can't remember how to find first derivative and have failed when trying to relearn it on my own but I'm going to see if this can unlock some of it.

marshall murphy's avatar

Well now

Srinivasan's avatar

What about statistics?

Claude Shannon's avatar

He lists statistics topics under probability though, like distributions and expected value. I'd change that section to "probability and statistics" and add the Central Limit Theorem, descriptive and inferential statistics, and hypothesis testing.

Neha Vari's avatar

A small correction, if you could do in Pythagoras Theorem diagram, it should be c2=a2+b2....

gr1.61803's avatar

This is good.

Luis Rodriguez's avatar

🙌🏻🙌🏻🙌🏻

Will Smith's avatar

Thank-you very much. This was helpful to me, to help me understand how our system works.

John Quiggin's avatar

This gets you to the starting line, though I think you could skip a fair bit if you were only concerned with "machine learning" as it's usually defined.

The comprehensible part of machine learning is linear discriminant analysis, picking a set of variables which can, with appropriate weights, classify inputs with a bunch of characteristics into different classes - in the archetypal example, plants into species.

Neural networks and then LLMs do the same thing, but with vastly increased data sets and replacing the linear disciminant with a multi-level non-linear version, including inferred hidden variables. This produces much more impressive results, but from a "black box" no one can really understand.