7 Comments
User's avatar
Jaisurya Prabakaran's avatar

Well explained 💯🔥

Yaroslav Bulatov's avatar

I like the differential equations view. People tend to add a disclaimed that it's the "small step size limit", but turns out this approximation works quite well even for large step size in high dimensions. Step size is bounded by largest eigenvalue, but if most of the mass comes from remaining dimensions, the "large" step size is actually quite small. I did some visualizations on this a while back https://machine-learning-etc.ghost.io/gradient-descent-linear-update/

Tivadar Danka's avatar

Really cool visualizations, thanks for sharing!

Un Poeta's avatar

I think there's a typo in the derivative definition: isn't the limit supposed to go to zero?

Tivadar Danka's avatar

Thanks, you are correct! Fixing it now.

Rabin Adhikari's avatar

I didn't get how you reached to the equations for the `monotonicity describes long-term behavior`.

Tivadar Danka's avatar

This is not a trivial step, it involves an argument using the Picard-Lindelöf theorem about the existence and uniqueness of IVP solutions. Gist is, if x(t) is, say, increasing and bounded, than it must have an asymptotic limit. In turn, the limit must be an equilibrium solution, as guaranteed by the Picard-Lindelöf theorem.

I omitted the technical details, because this is quite an intuitive result, and the Picard-Lindelöf theorem is way out of the article's scope.