Discussion about this post

User's avatar
Jaisurya Prabakaran's avatar

Well explained 💯🔥

Yaroslav Bulatov's avatar

I like the differential equations view. People tend to add a disclaimed that it's the "small step size limit", but turns out this approximation works quite well even for large step size in high dimensions. Step size is bounded by largest eigenvalue, but if most of the mass comes from remaining dimensions, the "large" step size is actually quite small. I did some visualizations on this a while back https://machine-learning-etc.ghost.io/gradient-descent-linear-update/

5 more comments...

No posts

Ready for more?