It is common knowledge among Python developers that NumPy is faster than vanilla Python. However, it is also true that if you use it wrong, it might hurt your performance. To know when it is beneficial to use NumPy, we have to understand how it works.
In this post, we are going to take a detailed look at why NumPy can be faster and when it is suboptimal to use.
Random numbers in Python
Our toy problem is going to be random number generation. Suppose that we need just a single random number. Should we use NumPy? Let’s test it! We are going to compare it with the built-in random number generator by running both ten million times, measuring the execution time.
For me, the results are the following.
So, for a single random number, NumPy is significantly slower. Why is this the case? What if we need an array instead of a single number? Will it also be slower?
This time, let’s generate a list/array of a thousand elements.
(I don’t want to wrap the expressions to be timed in lambdas, since function calls have an overhead in Python. I want to be as precise as possible, so I pass them as strings to the
Things are much different now. When we generate an array or random numbers, NumPy wins hands down.
There are some curious things about this result as well. First, we generated a single random number 10 000 000 times. Second, we generated an array of 1000 random numbers 10 000 times. In both cases, we have 10 000 000 random numbers in the end. Using the built-in method, it took ~2x time when we put them in a list. However, with NumPy, we see a ~30x speedup compared to itself when working with arrays!
To see what happens behind the scenes, we are going to profile the code.
Dissecting the code: profiling with cProfiler
To see how much time the script spends in each function, we are going to use cProfiler.
1. Built-in random for generating a single number
Let’s take a look at the built-in function first. In the following script, we create 10 000 000 random numbers, just as before.
We use cProfiler from the command line:
python -m cProfile -s tottime builtin_random_single.py
For our purposes, there are two important columns here. The
ncalls shows how many times a function was called, while
tottime is the total time spent in a function, excluding time spent in subfunctions.
So the built-in function
random.random() was called 10 000 000 times as expected, and the total time spent in that function was
What about the NumPy version?
2. NumPy random for generating a single number
Here, this is the script which we profile.
The results are surprising:
Similarly as before, the
numpy.random.random() function was indeed called 10 000 000 times as we expect. Yet, the script spent significantly more time in this function than in the built-in random before. Thus, it is more costly per function call.
However, when we start working with arrays and lists, things change dramatically.
3. Built-in random for generating a list of random numbers
As before, let’s generate a list of 1000 random numbers 10 000 times.
The result of the profiling is not that surprising.
As we see, about 60% of the time was spent on the list comprehensions: 10 000 calls, 0.628s total. (Recall that
tottime doesn’t count subfunction calls like calls to
Now we are ready to see why NumPy is faster when is used right.
4. NumPy random for generating an array of random numbers
The script is bare-bones as before.
This is the result of profiling.
10 000 calls, and even though each call takes longer, you obtain a
numpy.ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists. There are two significant differences between them.
- Python lists are dynamic, so you can append and remove elements for instance. NumPy arrays have fixed length, so you cannot add or delete without creating a new one. (And creating an array is costly.)
- Python lists can hold several datatypes at the same time, while a NumPy array can only contain one.
So, they are less flexible but significantly more performant. When this additional flexibility is not needed, NumPy outperforms Python.
Where is the break-even point?
To see exactly at which size does NumPy overtakes Python in random number generation, we can compare the two by measuring the execution times with respect to several sizes.
We can see that around 200, NumPy starts to overtake Python. Of course, this number might be different for other operations like calculating the sine or adding numbers together, but the tendency will be same. Python will slightly outperform NumPy for small input sizes, but as size grows, NumPy wins by a large margin.
NumPy can provide significant performance improvement when used right. However, for certain situations, Python can be a better choice. If you are not aware when to use NumPy, you might end up hurting your performance.
In general, you might be better off with plain old Python if for example
- you work with small lists,
- you want to frequently add/remove from the list.
When optimizing for performance, always think about how things work on the inside. This way, you can really supercharge your code, even in Python.