Linear regression (is the most important algorithm ever)
Dissecting the "Hello, World!" of machine learning
This post is the next chapter of my upcoming Mathematics of Machine Learning book, available in early access.
New chapters are available for the premium subscribers of The Palindrome as well, but there are 40+ chapters (~450 pages) available exclusively for members of the early access.
Now that we understand what machine learning is, we are ready to implement its "Hello, World!": linear regression.
However, contrary to the "Hello, World!" program, linear regression is not only a warming-up exercise; it is the most important machine learning model. Besides teaching us the fundamentals, linear regression is everywhere. For instance, linear models are the soul of neural networks.
In the following, we'll learn everything about this simple model. And I do mean everything: linear regression has quite a few different faces. We'll take a look at all of them, and implement our own custom linear regressor from scratch.
The base model
Let's consider a basic regression problem: we have
two variables x and y, (for example, representing the square footage and the price of a real estate; or the study hours and the final test score of a student)
and a set of observations D = {(xᵢ, yᵢ): i = 1, …, n}.
Our goal is to predict y from x; that is, the dependent variable from the independent variable. In other words, we are looking for a function h: ℝ → ℝ that fits our observations as good as possible.
There are two key design choices that determine our machine learning solution:
the family of functions we pick our h from (that is, our hypothesis space),
and the measure of fit. (That is, our loss function.)
Linear regression represents the simplest possible choices: we are looking for a function of the form h(x) = ax + b, where a, b are two real parameters, and we measure the fit by the so-called mean squared error.
Both of these components have a simple visual interpretation. For one, functions in the form of ax + b represent a straight line, hence the name linear functions.
Geometrically, a is the slope of the line, while b shows where our line intersects the y axis. a is the regression coefficient, while b is the intercept of the linear model.
Keep reading with a 7-day free trial
Subscribe to The Palindrome to keep reading this post and get 7 days of free access to the full post archives.