It's been a while since you heard from me. Summer is raging (38°C/100.4°F here), and I've been on a semi-vacation.
This is why you haven't seen any new posts; however, this doesn't mean I stopped working. There's a project I've been building in the past month or so, and it's time to let you know about it.
You know I'm writing a book about the internals of machine learning. It's a journey of theory and practice, where we build all the necessary mathematical foundations and then put them into code.
In the past month, I've been focusing on the code instead of the words, and slowly, it grew out of scope.
So, I decided to build a complete machine learning library for educational purposes. My goal is to teach you how the algorithms work. No hiding behind CUDA kernels and C code: every line is written in Python + NumPy, with the purpose to be read like a book.
This became mlfz
, which is now public, available on PyPI and GitHub. It's far from finished, but the computational graph and tensor library parts are ready. The documentation is intended to serve as a textbook, detailing the internals of machine learning.
For the next few months, this will be my primary focus. From August, expect more posts about the intricate details of neural networks!
Tividar, as a scientist I see gram posititive and gram negative cocci in your illustration. Just a thought.
With respect to machine learning I would recommend to have a look at numpy, matplotlib, scipy and especially http://scikit-learn.org/ (a standard machine learning library for python). I would try to implement the respective algorithms by my own in an efficient way and afterwards compare the results or write some add-ons (like adding a new class of distributions etc.).
Generally to get in touch with python I would like to mention a different approach. When I taught python to a motivated group of first semester students in Biology I initially asked them about what they were interested in to learn. From their answers I assembled a quite complex task for beginners:
They were supposed to setup an artificial ecosystem including mice and cats (both male and female, with an age, natural death rate, movement rules, etc.) to end up in a Lotka-Volterra-like Equilibrium. Afterwards we checked that together by fitting the Lotka-Volterra-Equations to the simulated data. The system and the fits were visualized with some graphs (animated) and a 2D arena showing the individuals as little moving pixels. The students liked this approach pretty much, because it was a hands-on strategy involving most basic ideas like loops and conditions, lists, numpy arrays, functions, classes, plotting, GUI and animation as well as concepts related to modelling and simulation like probability theory, statistics/data analysis, differential equations, dynamical systems, transferring ideas from math to algorithms and the other way around.
I will never forget the moment when the students realized that what they just did over the last seven 90min sessions + homework was nothing else than implementing the difficult math from their math course without noticing it by just playing around, involving their own ideas on how to make the system more realistic etc. All that started from the concept of a simple for-loop and a list. They were extremely fascinated by watching those moving dots on their screens, appearing, mating, dying and finally after some parameter tuning ending up in an steady state solution.
The range of the course topics was quite big given the time and we did not use any script or so - just hands-on coding but it worked out pretty well and the feedback was positive.
One of the six students is now doing a PhD in neuroscience, another one in evolutionary ecology.
From that experience I can highly recommend such an approach for learning programming in general as it involves most of the important concepts and ideas in a playful - but still science related - way. For me that would also be the project of choice for getting in touch with the concepts of a new programming language.