Machine Learning From Zero, Chapter 01
What is Machine Learning?
Hey! This is Tivadar from The Palindrome.
I’m finally working on my upcoming book Machine Learning From Zero, the sequel to Mathematics of Machine Learning.
The first chapter has just been finished, which sets the foundations for the neat stuff, like implementing neural networks from scratch. (Which is the core of the book.)
Without further ado, here’s an exclusive preview.
Enjoy!
P.S. I’m writing this book in Jupyter Notebooks, and I turn them into Substack posts with NotebookPress, a tool I’m building to bring technical writing on Substack to the next level. If you write math-and-code-heavy content, you should check it out.
Machine learning is training predictive models from data.
Sure, we can be academic about it and refine the definition of machine learning by looking at the countless of nuances, but that’s not what we are here to do. We are here to understand the core fundamentals of machine learning — the fundamentals that will take you further than anything else.
I believe the only way to aquire deep knowledge of any technical subject is to take it apart and put it back together again. This is what we are here to do.
The fundamental machine learning setup consists of:
a dataset, usually coming in the form of input and target variables,
a parametric function that models the relation between the input and the target variables,
and a loss function that measures the model’s fit to the data.
Let’s start with the data.
To look behind the curtain of machine learning algorithms, we have to precisely formulate the problems that we deal with. Three important parameters determine a machine learning paradigm: the input, the output, and the training data.
All machine learning tasks boil down to finding a model that provides additional insight into the data, i.e., a function f that transforms the input x into the useful representation y. This can be a prediction, an action to take, a high-level feature representation, and many more. We’ll learn about all of them.
Mathematically speaking, the basic machine learning setup consists of:
a dataset 𝒟,
a function f that describes the true relation between the input and the output,
and a parametric model h — also called a hypothesis — that serves as our estimation of f.
Remark. (Common abuses of machine learning notation.)
Note that although the function f only depends on the input x, the parametric model h also depends on the parameters and the training dataset.
Thus, it is customary to write h(x) as h(x; w, 𝒟), where w represents the parameters, and 𝒟 is our training dataset.
This dependence is often omitted, but keep in mind that it’s always there.
We make no restrictions about how the model f̂ is constructed. It can be a deterministic function like h(x) = −13.2 x² + 0.92 x + 3.0 or a probability distribution h(x) = P(Y = y ∣ X = x). Models have all kinds of families like generative, discriminative, and more. We’ll talk about them in detail; in fact, models will be the focal points of the majority of the chapters.
First, let’s focus on the paradigms themselves. There are four major ones:
supervised learning,
unsupervised learning,
semi-supervised learning,
and reinforcement learning.
What are these?
Supervised learning
The most common paradigm is supervised learning. There, we have inputs 𝐱ᵢ ∈ ℝᵐ and ground truth labels yᵢ that form our training dataset
Although the labels can be anything like numbers or text, they are all available for us. The goal is to construct a function that models the relationship between the input x and the target variable y.
Let’s saddle up and see a couple of examples.



