The Anatomy of the Least Squares Method, Part…

Nov 17, 2025

Modeling GPT activations

Read →

4 Comments

Mike X Cohen, PhD

Nov 17

Least-squares *and* LLMs in the same post? Definitely something I want to read ☺️

Rainbow Roxy

Nov 19

Brilliant. Connecting least squares directly to GPT activations in this final part provides a wonderfully clear bridge between foundational stats and cutting-edge AI. How do you foresee the limitation of a linear approach manifesting when modeling the increasingly complex, emergent behaviors of larger, more recent LLMs?

Reply (1)

Mike X Cohen, PhD

Nov 19

There's an expression in statistics along the lines of "the unreasonable effectiveness of linear models." There are many examples in many disciplines (including my former profession as a neuroscientist) where linear models work surprisingly well at identifying patterns in complex nonlinear systems.

It's too big a topic just for this comment, but I think it's a combination of three things: (1) all nonlinear functions can be locally approximated using linear functions if you zoom in enough (e.g., that's literally the foundation of calculus!). (2) Linear models capture the linear parts of nonlinear functions, which can be insightful or can be misleading. In low-dimensional systems we can visualize the data to see if there's a good match, but in high-dimensions we just hope that the linear approximations are reasonable even if formally incorrect. (3) There really are linear relationships sandwiched in between nonlinear transformations. That seems to be the case for the MLP subblocks of the transformers (see, e.g., Anthropic's interpretability work).

Comment removed

Comment removed

Thanks NF! I think ML as a way to understand LLMs is a very powerful approach.

The Palindrome

The Anatomy of the Least Squares Method, Part…