Mathematics of Machine Learning is Out Now!

Yes, there is a physical version

Jan 29, 2025

Four years ago, I started a project that turned out to be life-altering, and it has just passed a huge milestone.

I'm happy to announce that my Mathematics of Machine Learning book has been released by Packt Publishing! Yes, there is a physical version, and you can order it right now!

Get the Book Now

I started the book in early 2021, right after losing my life savings on a startup I co-founded. (We have made every single textbook mistake.) To recharge my batteries, I started writing about the only two topics I really understood: mathematics and machine learning.

First, there were a couple of educational posts about the expected value or the mean-squared error. Writing those posts has been the most fun I had in years, and you liked them, too. So, I wrote some more. And some more.

It was not long before I realized I had written the first couple of chapters from an imaginary math textbook. One that I have always wanted but never had. A book that walks on the fine line between practice and theory, where we are motivated by machine learning, taking the math seriously without getting lost in the ivory tower.

So, I decided to make the dream real.

I announced the project, and you loved it. Thus, the early access was born, and I spent the next couple of years vigorously working on the book. And it’s ready now.

One of the most common questions I get from engineers and data scientists is, why study math? After all, you can start machine learning in minutes by loading a pre-trained model and crunching out predictions immediately. The abstractions provided by frameworks such as PyTorch or scikit-learn take away the technical details and enable you to build models faster than ever before.

However — I am speaking from painful personal experience here — building models from data is still a hard problem. With a car analogy, machine learning is not a simple cruise on the highway. It’s more like operating heavy machinery on swampy grounds to build a skyscraper. You must understand what's under the hood to do the job well.

The mathematics of machine learning rests upon three pillars:

linear algebra for describing data and building models,
calculus to optimize them,
and probability theory to reason under uncertainty.

In a whooping ~700 pages, I’ll tell you all you need to know about these. From vector spaces through gradient descent to the concept of entropy, I explain all of the essentials in a clear, richly illustrated manner, all with reference implementations in Python. Yes, we’ll implement Gram-Schmidt orthogonalization, the QR algorithm, gradient descent, and many more from scratch. If you are a frequent reader, you know my style.

Get the Book Now

I cannot thank you enough for each of you who supported me with the early access. Your support made it possible for me to continue working on the project, and your feedback made this book a whole lot better. Thank you! <3

Finally, here’s a preview in form of some of my earlier posts, taken right from the books. You can also find the preliminary table of contents below. (It is very likely the final ToC.)

The science of science

Tivadar Danka

December 1, 2022

It’s 6:00 AM. The alarm clock is blasting, but you are having a hard time getting out of bed. You don’t feel well. Your muscles are weak, and your head is exploding. After a brief struggle, you manage to call a doctor and list all the symptoms. Your sore throat makes speaking painful.

Read full story

Matrices and graphs

Tivadar Danka

January 9, 2023

The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Read full story

Probabilities, densities, and distributions

Tivadar Danka

January 19, 2023

Probabilities, densities, and distributions

Raise your hands if you have seen something like this before.

Read full story

How to measure the angle between two functions

Tivadar Danka

December 8, 2022

How to measure the angle between two functions

This will surprise you: sine and cosine are orthogonal to each other.

Read full story

Here’s the table of contents.

Part I. Linear Algebra

1. Vectors and Vector Spaces
1.1 What is a vector space?
1.2 Linear basis
1.3 Vectors in practice

2. The Geometric Structure of Vector Spaces
2.1 Norms and distances
2.2 Inner products, angles, and reasons to care about them

3. Linear Algebra in Practice
3.1 Vectors in NumPy
3.2 Matrices, the workhorses of linear algebra

4. Linear Transformations
4.1 What is a linear transformation?
4.2 Change of basis
4.3 Linear transformations in the Euclidean plane
4.4 Determinants and volume

5. Matrices and Equations
5.1 Linear equations
5.2 The LU decomposition
5.3 Determinants in practice

6. Eigenvalues and Eigenvectors
6.1 Eigenvalues of matrices
6.2 Finding eigenvalue-eigenvector pairs
6.3 Eigenvectors, eigenspaces, and their bases

7. Matrix Factorizations
7.1 Special transformations
7.2 Spectral decomposition theorem
7.3 Singular Value Decomposition
7.4 Orthogonal projections
7.5 Computing eigenvalues
7.6 The QR algorithm

8. Matrices and Graphs
8.1 The directed graph of a nonnegative matrix
8.2 Benefits of the graph representation
8.3 The Frobenius normal form

Part II. Calculus

9. Functions
9.1 Functions in theory
9.2 Functions in practice

10. Numbers, Sequences, and Series
10.1 Numbers
10.2 Sequences
10.3 Series

11. The Structure of Real Numbers
11.1 Topology
11.2 Limits
11.3 Continuity

12. Differentiation
12.1 Differentiation in theory
12.2 Differentiation in practice

13. Integration
13.1 Integration in theory
13.2 Integration in practice

14. Optimization
14.1 Minima and maxima
14.2 Derivatives and local behavior
14.3 Gradient descent basics
14.4 Why gradient descent works

Part III. Multivariable Calculus

15. Multivariable Functions
15.1 What is a multivariable function?
15.2 Linear functions in multiple variables
15.3 The curse of dimensionality

16. Derivatives and Gradients
16.1 Partial derivatives
16.2 Derivatives of vector-valued functions

17. Optimization in Multiple Variables
17.1 Minima and maxima, revisited
17.2 Gradient descent in full form

Part IV. Probability Theory

18. What is Probability?
18.1 The language of thinking
18.2 The axioms of probability
18.3 Conditional probability

19. Random Variables and Distributions
19.1 Random variables
19.2 Discrete distributions
19.3 Real-valued distributions
19.4 Density functions

20. The Expected Value
20.1 Discrete random variables
20.2 Continuous random variables
20.3 Properties of the expected value
20.4 Variance
20.5 The Law of Large Numbers

21 The Maximum Likelihood Estimation
21.1 Probabilistic modeling 101
21.2 Modelling heights
21.3 The general method
21.4 The German tank problem

Kurt

Jan 30

I just pre ordered a paperback copy. Thanks. Looking forward to getting it. It said May is the ship date, right?

Expand full comment

1 reply by Tivadar Danka

Vijay Kompella

Jan 29

I’ve been there and understand what it’s like to lose all your savings on a startup and learn from the mistakes. Wishing you all the best! I can’t wait to read through the copy I ordered. Cheers!

13 more comments...

The Palindrome

The science of science

Matrices and graphs

Probabilities, densities, and distributions

How to measure the angle between two functions

Discussion about this post