The Palindrome

The Palindrome

Share this post

The Palindrome
The Palindrome
Epsilons, no. 7: What is a kernel?

Epsilons, no. 7: What is a kernel?

Unraveling the most overloaded term in mathematics

Tivadar Danka's avatar
Tivadar Danka
May 31, 2024
∙ Paid
28

Share this post

The Palindrome
The Palindrome
Epsilons, no. 7: What is a kernel?
2
1
Share

Yesterday, I got the following question in the Mathematics of Machine Learning early access Discord:

Hi @Tivadar I was reading Gemma-10M Technical Overview on medium. In the last paragraph under Infini-Attention, the author gave an intuition of kernel:

“But remember, while these simple linear operations generally work, there’s a large graveyard of papers showing that we can’t effectively model softmax(Q K-T) with just a linear matrix multiplications. Therefore, instead of just embedding Q and K into our matrix directly, we instead apply a kernel to both Q and K, with the hope that the softmax operation can be learned as an affine transformation of Q and K in the kernel, σ. This can be a lot to digest, and I find it helpful to think of kernels in terms of how (x+y)² = x² + 2xy + y² are non-linear in both x, y; however, linear in [x², xy, y²]. In this case, our kernel σ.”

I went back to your book to look for a definition of a kernel. It was mentioned in your book for orthogonality but not…

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Tivadar Danka
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share