Machine Learning From Zero

An educational machine learning library

Jul 17, 2024

It's been a while since you heard from me. Summer is raging (38°C/100.4°F here), and I've been on a semi-vacation.

This is why you haven't seen any new posts; however, this doesn't mean I stopped working. There's a project I've been building in the past month or so, and it's time to let you know about it.

You know I'm writing a book about the internals of machine learning. It's a journey of theory and practice, where we build all the necessary mathematical foundations and then put them into code.

In the past month, I've been focusing on the code instead of the words, and slowly, it grew out of scope.

So, I decided to build a complete machine learning library for educational purposes. My goal is to teach you how the algorithms work. No hiding behind CUDA kernels and C code: every line is written in Python + NumPy, with the purpose to be read like a book.

This became mlfz, which is now public, available on PyPI and GitHub. It's far from finished, but the computational graph and tensor library parts are ready. The documentation is intended to serve as a textbook, detailing the internals of machine learning.

For the next few months, this will be my primary focus. From August, expect more posts about the intricate details of neural networks!

Richard Schall

Feb 11

Tividar, as a scientist I see gram posititive and gram negative cocci in your illustration. Just a thought.

Expand full comment

suman suhag

Oct 28

The classification theorem for finite simple groups might be a good candidate though. The classification theorem takes up tens of thousands of journal pages, and mostly between the 1950s and 2004. It was a huge deal when I was in college in the 1990s. The Wikipedia article is pretty good, but of course assumes you know something about group theory to begin with. And I’ll be honest that I don’t remember most of the definitions involved, and have forgotten more than I do remember.

Simple groups are building blocks of all groups, in a loose sense similar to how prime numbers are the building blocks of the multiplicative ring of integers. The Jordan Hölder theorem makes this precise. So by classifying the finite simple groups, we have gained a massive leap in understanding how all finite groups work.

The Feit Thompson theorem, which was probably the first glimpse that such a thing was possible, is about 255 pages of very dense mathematics. It was important enough that it took an entire issue of the journal in question. This theorem says that every finite simple group of odd order is solvable, and the proof was certainly the longest up to that point in group theory, and possibly the longest in any journal to that point. It’s not the longest any more, though. One proof (Aschbacher and Smith, on quasithin groups) ends up being about 1300 pages!

But the point here is that there was no reason to believe this was even possible. Especially in such a small number of naturally occurring families — alternating, cyclic, Lie groups, derived subgroups of Lie groups, and 26 groups that don’t fit anywhere else. Pretty amazing.

As to why math problems are getting harder so solve, I’m not sure that’s true. We’ve got better tools to tackle the hard problems now, but things have gotten somewhat specialized. For a non-specialist it’s hard to even understand what some of the problems are asking about — the Hodge Conjecture, one of the millennium problems, is definitely in this category. And mathematicians want to generalize things as broadly as possible, so sometimes helpful details get lost. So the complexity may not be essential, just in the presentation and the fact that lay people, even those in other specializations of math, don’t know the jargon involved. This is, of course, a double edged sword since often the more general questions can actually be easier to tackle, having lost unnecessary details that get in the way.

However, many problems that were super difficult in the past fall down fairly easily now. Take partial differential equations, for instance. We have tons of numerical methods that didn’t exist 100 years ago, along with the computers to run them on, so engineers and scientists don’t do as much analytic equation solving. Numerical approximations are generally sufficient unless you’re working in some unstable region of an equation or are try to prove some qualitative thing about the system involved. And we can deal with many of those too.

1 more comment...

The Palindrome

Discussion about this post