Mathematics of Machine Learning official release announcement!
Yes, there will be a physical version
Four years ago, I started a project that turned out to be life-altering, and it has just passed a huge milestone.
I'm happy to announce that my Mathematics of Machine Learning book will soon be published by Packt Publishing! Yes, there will be a physical version, and you can pre-order it right now! The paperback version is $39.99, which is a heavily discounted price. (The final price will be $59.99 after publication, expected in May.)
I started the book in early 2021, right after losing my life savings on a startup I co-founded. (We have made every single textbook mistake.) To recharge my batteries, I started writing about the only two topics I really understood: mathematics and machine learning.
First, there were a couple of educational posts about the expected value or the mean-squared error. Writing those posts has been the most fun I had in years, and you liked them, too. So, I wrote some more. And some more.
It was not long before I realized I had written the first couple of chapters from an imaginary math textbook. One that I have always wanted but never had. A book that walks on the fine line between practice and theory, where we are motivated by machine learning, taking the math seriously without getting lost in the ivory tower.
So, I decided to make the dream real.
I announced the project, and you loved it. Thus, the early access was born, and I spent the next couple of years vigorously working on the book. And it’s ready now.
One of the most common questions I get from engineers and data scientists is, why study math? After all, you can start machine learning in minutes by loading a pre-trained model and crunching out predictions immediately. The abstractions provided by frameworks such as PyTorch or scikit-learn take away the technical details and enable you to build models faster than ever before.
However — I am speaking from painful personal experience here — building models from data is still a hard problem. With a car analogy, machine learning is not a simple cruise on the highway. It’s more like operating heavy machinery on swampy grounds to build a skyscraper. You must understand what's under the hood to do the job well.
The mathematics of machine learning rests upon three pillars:
linear algebra for describing data and building models,
calculus to optimize them,
and probability theory to reason under uncertainty.
In a whooping ~500 pages, I’ll tell you all you need to know about these. From vector spaces through gradient descent to the concept of entropy, I explain all of the essentials in a clear, richly illustrated manner, all with reference implementations in Python. Yes, we’ll implement Gram-Schmidt orthogonalization, the QR algorithm, gradient descent, and many more from scratch. If you are a frequent reader, you know my style.
I cannot thank you enough for each of you who supported me with the early access. Your support made it possible for me to continue working on the project, and your feedback made this book a whole lot better. Thank you! <3
Finally, here’s a preview in form of some of my earlier posts, taken right from the books. You can also find the preliminary table of contents below. (It is very likely the final ToC.)
Here’s the table of contents.
Part I. Linear Algebra
1. Vectors and Vector Spaces
1.1 What is a vector space?
1.2 Linear basis
1.3 Vectors in practice
2. The Geometric Structure of Vector Spaces
2.1 Norms and distances
2.2 Inner products, angles, and reasons to care about them
3. Linear Algebra in Practice
3.1 Vectors in NumPy
3.2 Matrices, the workhorses of linear algebra
4. Linear Transformations
4.1 What is a linear transformation?
4.2 Change of basis
4.3 Linear transformations in the Euclidean plane
4.4 Determinants and volume
5. Matrices and Equations
5.1 Linear equations
5.2 The LU decomposition
5.3 Determinants in practice
6. Eigenvalues and Eigenvectors
6.1 Eigenvalues of matrices
6.2 Finding eigenvalue-eigenvector pairs
6.3 Eigenvectors, eigenspaces, and their bases
7. Matrix Factorizations
7.1 Special transformations
7.2 Spectral decomposition theorem
7.3 Singular Value Decomposition
7.4 Orthogonal projections
7.5 Computing eigenvalues
7.6 The QR algorithm
8. Matrices and Graphs
8.1 The directed graph of a nonnegative matrix
8.2 Benefits of the graph representation
8.3 The Frobenius normal form
Part II. Calculus
9. Functions
9.1 Functions in theory
9.2 Functions in practice
10. Numbers, Sequences, and Series
10.1 Numbers
10.2 Sequences
10.3 Series
11. The Structure of Real Numbers
11.1 Topology
11.2 Limits
11.3 Continuity
12. Differentiation
12.1 Differentiation in theory
12.2 Differentiation in practice
13. Integration
13.1 Integration in theory
13.2 Integration in practice
14. Optimization
14.1 Minima and maxima
14.2 Derivatives and local behavior
14.3 Gradient descent basics
14.4 Why gradient descent works
Part III. Multivariable Calculus
15. Multivariable Functions
15.1 What is a multivariable function?
15.2 Linear functions in multiple variables
15.3 The curse of dimensionality
16. Derivatives and Gradients
16.1 Partial derivatives
16.2 Derivatives of vector-valued functions
17. Optimization in Multiple Variables
17.1 Minima and maxima, revisited
17.2 Gradient descent in full form
Part IV. Probability Theory
18. What is Probability?
18.1 The language of thinking
18.2 The axioms of probability
18.3 Conditional probability
19. Random Variables and Distributions
19.1 Random variables
19.2 Discrete distributions
19.3 Real-valued distributions
19.4 Density functions
20. The Expected Value
20.1 Discrete random variables
20.2 Continuous random variables
20.3 Properties of the expected value
20.4 Variance
20.5 The Law of Large Numbers
21 The Maximum Likelihood Estimation
21.1 Probabilistic modeling 101
21.2 Modelling heights
21.3 The general method
21.4 The German tank problem
I just pre ordered a paperback copy. Thanks. Looking forward to getting it. It said May is the ship date, right?
Will there be an e-version too?