5 Comments
User's avatar
Mike X Cohen, PhD's avatar

Least-squares *and* LLMs in the same post? Definitely something I want to read ☺️

Expand full comment
Neural Foundry's avatar

This series has been incredibly instructive. Aplying least squares to model GPT activations is such a clever approach to understanding LLM internals. The progression from theory to GPT2 analysis really demonstrates how foundational math translates into practcal ML applications. Can't wait to dive into the code on GitHub.

Expand full comment
Mike X Cohen, PhD's avatar

Thanks NF! I think ML as a way to understand LLMs is a very powerful approach.

Expand full comment
Rainbow Roxy's avatar

Brilliant. Connecting least squares directly to GPT activations in this final part provides a wonderfully clear bridge between foundational stats and cutting-edge AI. How do you foresee the limitation of a linear approach manifesting when modeling the increasingly complex, emergent behaviors of larger, more recent LLMs?

Expand full comment
Mike X Cohen, PhD's avatar

There's an expression in statistics along the lines of "the unreasonable effectiveness of linear models." There are many examples in many disciplines (including my former profession as a neuroscientist) where linear models work surprisingly well at identifying patterns in complex nonlinear systems.

It's too big a topic just for this comment, but I think it's a combination of three things: (1) all nonlinear functions can be locally approximated using linear functions if you zoom in enough (e.g., that's literally the foundation of calculus!). (2) Linear models capture the linear parts of nonlinear functions, which can be insightful or can be misleading. In low-dimensional systems we can visualize the data to see if there's a good match, but in high-dimensions we just hope that the linear approximations are reasonable even if formally incorrect. (3) There really are linear relationships sandwiched in between nonlinear transformations. That seems to be the case for the MLP subblocks of the transformers (see, e.g., Anthropic's interpretability work).

Expand full comment