5 Books That Will Teach You the Math Behind Machine Learning

After the explosive growth of open-source machine learning and deep learning frameworks, the field is more accessible than ever. Thanks to this, it went from a tool for researchers to a widely adopted and used method, fueling the insane growth of technology we experience now. Understanding how the algorithms really work can give you a huge advantage in designing, developing, and debugging machine learning systems. Due to its mathematical nature, this task can seem daunting for many. However, this does not have to be the way.

From a high level, there are four pillars of mathematics in machine learning.

  1. Linear algebra
  2. Probability theory
  3. Multivariate calculus
  4. Optimization theory

It takes time to build a solid foundation of these and understand the inner workings of the state of the art machine learning algorithms such as convolutional networks, generative adversarial networks, and many others. This won’t be an afternoon project, but given that you consistently dedicate time to this, you can go pretty far in a short amount of time. There are some great resources to guide you along the way. In this post, I have selected the five which were most helpful for me.

Linear Algebra Done Right by Sheldon Axler

Linear algebra is a beautiful but tough subject for beginners if it is taught the “classical” way, which is determinants and matrices first, vector spaces later. However, when it is done the other way around, it is surprisingly intuitive and clear. This book presents linear algebra in a very friendly and insightful way. I wish I had learned it from this book, instead of the old way.

You can find the author’s page about the book here.

Probability: For the Enthusiastic Beginner by David Morin

Most machine learning books don’t introduce probability theory properly and they use confusing notation, often mixing up density functions and discrete distributions. This can be very difficult to get through without a solid background in probability.

This book will provide you with just that: a detailed, mathematically correct yet user friendly introduction to the subject. This is suitable for learners without any previous exposure on probability.

If you want to learn what probability really is, I wrote an introduction to probability from a more abstract perspective.

Multivariate Calculus by Denis Auroux (from MIT OpenCourseWare)

I have cheated a little bit here, since this is not a book but an actual university course on multivariate calculus at MIT, recorded and made available for the public. Out of all the resources I know, this is by far the best introduction to the subject. It doesn’t hurt to have a background in univariate calculus, but the lectures can be followed without it as well.

You can find the full course here.

One thing this course doesn’t cover well is the gradient descent algorithm, which is fundamental for neural networks. If you would like to learn more about this, I wrote an introductory post on the subject, which explains gradient descent from scratch.

Grokking Deep Learning by Andrew Trask

This book is probably my favorite in this list. I love all of them, but if you only have time to read one, read this one.

It contains a complete hands-on introduction to the inner workings of neural networks, with code snippets covering all of the material. Even though not specifically geared towards advanced mathematics, by the end of this book you’ll know more about the mathematics of deep learning than 95% of data scientists, machine learning engineers, and other developers.

You’ll also build a neural network from scratch, which is probably the best learning exercise you can undertake. When starting out with machine learning, I have also built a convolutional network from scratch in pure NumPy. If you are interested, I wrote a detailed guide on how to do it yourself.

Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

This is where all of the theory you have learned comes together. It was written by some of the greatest minds in machine learning, this book synthesizes the mathematical theory and puts the heavy machinery into use, providing a solid guide into state of the art deep learning methods such as convolutional and recurrent networks, autoencoders and many more.

The best is that the book is freely available online for everyone. Given that this is the number one resource for deep learning researchers and developers, this is pretty great.

Among all of the resources I have listed here, this is probably the most difficult to read. Understanding deep learning requires you to look at the algorithms with a probabilistic perspective, which can be difficult. If you would like to learn how can a problem be translated into the language of probability and statistics, I have written a detailed guide for you, where I explain the most important details in a beginner-friendly way.

Let’s get to learning!

As I have mentioned, probably you won’t be able to burn through all these resources in an afternoon. You’ll need to work hard, but it will pay off in the future. Building up knowledge is the best investment. In the future, this will give you a huge advantage in building machine learning systems. Not to mention that the theory behind machine learning is beautiful.

Share on facebook
Share on twitter
Share on linkedin

Related posts