Discussion about this post

User's avatar
Jaisurya Prabakaran's avatar

Well explained 💯🔥

Expand full comment
Yaroslav Bulatov's avatar

I like the differential equations view. People tend to add a disclaimed that it's the "small step size limit", but turns out this approximation works quite well even for large step size in high dimensions. Step size is bounded by largest eigenvalue, but if most of the mass comes from remaining dimensions, the "large" step size is actually quite small. I did some visualizations on this a while back https://machine-learning-etc.ghost.io/gradient-descent-linear-update/

Expand full comment
5 more comments...

No posts