# Introduction to computational graphs

### Laying the groundwork of our custom neural network framework

Hi there! It's Tivadar from The Palindrome. This publication is possible because of your support.

If you enjoy my posts, share them with your friends and upgrade your subscription! I regularly post deep-dive explainers from the intersection of mathematics and machine learning.

In computer science and mathematics, constructing a new representation of an old concept can kickstart new fields. One of my favorite examples is the connection between graphs and matrices: representing one as the other has proven to be extremely fruitful for both objects. Translating graph theory to linear algebra and vice versa was the key to numerous hard problems.

What does this have to do with machine learning? If you have ever seen a neural network, you know the answer.

From a purely mathematical standpoint, a neural network is a composition of a sequence of functions, say, *N*(*x*) = Softmax(Linear₂(ReLU(Linear₁(**x**)))), whatever those mystical Softmax, Linear, and ReLU functions might be.

On the other hand, take a look at this figure below. Chances are, you have already seen something like this.

This is the computational graph representation of the neural network *N*. While the expression *N*(*x*) = Softmax(Linear₂(ReLU(Linear₁(**x**)))) describes *N* on the macro level, the graph zooms into the micro domain.

This is more than just a fun visualization tool. Computational graphs provide us with a way to manage the complexity of large models like a deep neural network.

To give you a taste of how badly we need this, try calculating the derivative of *N*(*x*) by hand. Doing this on paper is daunting enough, let alone provide an efficient implementation. Computational graphs solve this problem brilliantly, and in the process, they make machine learning computationally feasible.

Here are the what, why, and how of computational graphs.

(There will be some heavy coding involved. Feel free to follow along and experiment in this Google Colab notebook!)

**What is a computational graph?**

Let's go back to square one and take a look at the expression *c*(*a* + *b*). Although this merely seems like a three-variable function defined by *f*(*a*, *b*, *c*) = *c* (*a* + *b*), we can decompose it further, down to its computational atoms. Unraveling the formula, we see that *f* is the composition of the two functions *f*₁(*a*, *b*) = *a* + *b* and*f*₂(*a*, *b*) = *a b* via *f*(*a*, *b*, *c*) = *f*₂(*c*, *f*₂(*a*, *b*)).

Even though this is a simple example, the expression *f*₂(*c*, *f*₂(*a*, *b*)) is already not the nicest to look at. You can imagine how quickly the complexity slips out of control; just try to re-write (*a* + *b*)(*e*(*c* + *d*) + *f*) in the same manner. So, we need a more expressive notation.

What if, instead of using an algebraic expression, we build a graph where the components *a*, *b*, *c*, f₂, and *f*₂ are nodes, and the connections represent the inputs to each component?

It's easier to show than tell, so this is what I am talking about.

In other words, the above figure depicts a tree graph, where

the leaf nodes (such as

*a*,*b*, and*c*) are the input variables,and the inner nodes (such as

*f*₁ and*f*₂) are computations, with the children nodes as the inputs.

Because this bears a resemblance to how brain cells communicate, these nodes are called *neurons*. Hence the term *neural network*, which is just a fancy name for computational graphs.

Now, the graph above is just a symbolic representation for the function *f*(*a*, *b*, *c*) = *c*(*a* + *b*). How do we perform the actual computations? We start from the inputs at the leaf nodes, and work our way up step by step. Here's an illustration using our recurring example *c*(*a* + *b*) with the inputs *a* = 1, *b* = 3, and *c* = 2.

Upon initialization, the computation takes two steps to complete.

This process is called the *forward pass*, as with each step, we propagate the initial values forward in the computational graph, flowing from the leaves towards the root node. Think of this as a function call. Say, if our computational graph encodes a machine learning model, the forward pass results in a prediction.

Simple as they are, computational graphs provide a tremendous advantage in practice. So, let's create our own framework!

**Laying the groundwork of our computational graph framework**

Mathematically speaking, computational graphs are nothing special. However, things change significantly when we move to the computational realm. Computational graphs provide such an effective framework that training large neural networks would not be feasible without utilizing the clever algorithms available on computational graphs. (You might have heard about backpropagation; we'll study and implement it in the next episode. Safe to say, backpropagation is one of the pillars of deep learning.)

So, it's time to put what we've learned so far into code. Fasten your seatbelts.

(There are several libraries out there with the purpose of providing a clear and simple implementation of computational graphs, Andrej Karpathy's micrograd and George Hotz's tinygrad are the ones that come to mind. I have took significant inspiration from both. Essentially, we'll build our own micrograd library in the following posts, one step at a time.)

As we've discussed, a computational graph is made out of neurons, that is, units of computation.

What properties does a neuron have, and what can it do? Simple. Each one has

a scalar value,

a set of inputs,

and a computational rule describing how to obtain that value from the inputs.

As each neuron acts as a function, the most straightforward way would be to create a `Neuron`

class, add a `__call__`

method to make it callable, then subclass it for each possible function and override the function call. However, this would require a separate class for each component, quickly leading to an uncontrollable class proliferation.

Thus, we'll choose the other way and build computational graphs via supercharging the operations and functions, then dynamically build graphs via applying them. In other words, instead of working with functions, we supercharge the numeric variables to do the work for us.

Again, it's better to show than tell, so let's get to coding! Here's the skeleton of our beefed-up `Scalar`

class.

(I've also added a human-readable string representation for convenience.)

What about the computations? Let's start with addition, multiplication, and exponentiation.

In Python, the magic method `__add__`

determines how the class behaves with respect to the addition operator `+`

. Similarly, `__mul__`

is responsible for the multiplication `*`

and `__pow__`

is for the exponentiation `**`

.

Let's see `Scalar`

in action on our recurring example *c*(*a* + *b*).

As expected, `f`

is indeed a `Scalar`

with a value equal to 8, and previous `Scalar`

objects `c`

and `a + b`

. (Where `a + b`

is not an explicitly named variable.)

Similarly, we can easily create functions that act on `Scalar`

objects. Here's the famous Sigmoid function.

We're here to do machine learning, so how about implementing a linear regression model? First, the computational graph of *ax* + *b*.

Nothing we haven't seen before; in fact, structurally, it is the same as the graph of *c*(*a* + *b*). The only difference is the computations carried out by the nodes.

Now, the implementation with our computational graph framework. (Well, soon-to-be framework.)

We can already glimpse the power of operator overloading: the function `linear_regression`

is syntactically the same as the vanilla Python version, but this time, it can operate on our powerful `Scalar`

objects. (Well, again, soon-to-be powerful.)

With this, we can even fully trace back the computations.

That's cool, but why would we ever want to do that? It's not clear at this point, but trust me on this, tracing the computations backward is a key tool in calculating derivatives effectively.

But why would we want to compute the derivative? Because we want to fit the model with gradient descent.

See you in the next post, where we will witness the true power of computational graphs!