Is NumPy really faster than Python?

It is common knowledge among Python developers that NumPy is faster than vanilla Python. However, it is also true that if you use it wrong, it might hurt your performance. To know when it is beneficial to use NumPy, we have to understand how it works.

In this post, we are going to take a detailed look at why NumPy can be faster and when it is suboptimal to use.

Random numbers in Python

Our toy problem is going to be random number generation. Suppose that we need just a single random number. Should we use NumPy? Let’s test it! We are going to compare it with the built-in random number generator by running both ten million times, measuring the execution time.

from timeit import timeit
from numpy.random import random as random_np
from random import random as random_py
n_runs = 10000000
t_builtin = timeit(random_py, number=n_runs)
t_numpy = timeit(random_np, number=n_runs)
print(f"Built-in random:\t{t_builtin}s")
print(f"NumPy random: \t{t_numpy}s")
view raw random.py hosted with ❤ by GitHub

For me, the results are the following.

So, for a single random number, NumPy is significantly slower. Why is this the case? What if we need an array instead of a single number? Will it also be slower?

This time, let’s generate a list/array of a thousand elements.

from timeit import timeit
size = 1000
n_runs = 10000
t_builtin_list = timeit(
"[random_py() for _ in range(size)]",
setup=f"from random import random as random_py; size={size}",
number=n_runs
)
t_numpy_array = timeit(
"random_np(size)",
setup=f"from numpy.random import random as random_np; size={size}",
number=n_runs
)
print(f"Built-in random with lists:\t{t_builtin_list}s")
print(f"NumPy random with arrays: \t{t_numpy_array}s")
view raw random_array.py hosted with ❤ by GitHub

(I don’t want to wrap the expressions to be timed in lambdas, since function calls have an overhead in Python. I want to be as precise as possible, so I pass them as strings to the timeit function.)

Things are much different now. When we generate an array or random numbers, NumPy wins hands down.

There are some curious things about this result as well. First, we generated a single random number 10 000 000 times. Second, we generated an array of 1000 random numbers 10 000 times. In both cases, we have 10 000 000 random numbers in the end. Using the built-in method, it took ~2x time when we put them in a list. However, with NumPy, we see a ~30x speedup compared to itself when working with arrays!

To see what happens behind the scenes, we are going to profile the code.

Dissecting the code: profiling with cProfiler

To see how much time the script spends in each function, we are going to use cProfiler.

1. Built-in random for generating a single number

Let’s take a look at the built-in function first. In the following script, we create 10 000 000 random numbers, just as before.

from random import random as random_py
n_runs = 10000000
for _ in range(n_runs):
random_py()

We use cProfiler from the command line:

python -m cProfile -s tottime builtin_random_single.py
10001350 function calls (10001307 primitive calls) in 1.088 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.706 0.706 1.088 1.088 builtin_random_single.py:1(<module>)
10000000 0.380 0.000 0.380 0.000 {method 'random' of '_random.Random' objects}
1 0.001 0.001 0.001 0.001 {built-in method _imp.create_dynamic}
3 0.000 0.000 0.000 0.000 {built-in method marshal.loads}
5 0.000 0.000 0.000 0.000 {built-in method _imp.create_builtin}

For our purposes, there are two important columns here. The ncalls shows how many times a function was called, while tottime is the total time spent in a function, excluding time spent in subfunctions.

So the built-in function random.random() was called 10 000 000 times as expected, and the total time spent in that function was 0.380 seconds.

What about the NumPy version?

2. NumPy random for generating a single number

Here, this is the script which we profile.

from numpy.random import random as random_np
n_runs = 10000000
for _ in range(n_runs):
random_np()

The results are surprising:

10075395 function calls (10072374 primitive calls) in 3.027 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000000 2.076 0.000 2.076 0.000 {method 'random' of 'numpy.random.mtrand.RandomState' objects}
1 0.827 0.827 3.028 3.028 numpy_random_single.py:1(<module>)
316 0.020 0.000 0.020 0.000 {built-in method builtins.compile}
115 0.015 0.000 0.015 0.000 {built-in method marshal.loads}
19 0.008 0.000 0.010 0.001 {built-in method _imp.create_dynamic}

Similarly as before, the numpy.random.random() function was indeed called 10 000 000 times as we expect. Yet, the script spent significantly more time in this function than in the built-in random before. Thus, it is more costly per function call.

However, when we start working with arrays and lists, things change dramatically.

3. Built-in random for generating a list of random numbers

As before, let’s generate a list of 1000 random numbers 10 000 times.

from random import random as random_py
size = 1000
n_runs = 10000
for _ in range(n_runs):
[random_py() for _ in range(size)]

The result of the profiling is not that surprising.

10011350 function calls (10011307 primitive calls) in 1.090 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.628 0.000 1.040 0.000 builtin_random_list.py:8(<listcomp>)
10000000 0.412 0.000 0.412 0.000 {method 'random' of '_random.Random' objects}
1 0.047 0.047 1.090 1.090 builtin_random_list.py:1(<module>)
1 0.001 0.001 0.001 0.001 {built-in method _imp.create_dynamic}
3 0.000 0.000 0.000 0.000 {built-in method marshal.loads}

As we see, about 60% of the time was spent on the list comprehensions: 10 000 calls, 0.628s total. (Recall that tottime doesn’t count subfunction calls like calls to random.random() here.)

Now we are ready to see why NumPy is faster when is used right.

4. NumPy random for generating an array of random numbers

The script is bare-bones as before.

from numpy.random import random as random_np
size = 1000
n_runs = 10000
for _ in range(n_runs):
random_np(size)

This is the result of profiling.

85395 function calls (82374 primitive calls) in 0.188 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.062 0.000 0.062 0.000 {method 'random' of 'numpy.random.mtrand.RandomState' objects}
316 0.020 0.000 0.020 0.000 {built-in method builtins.compile}
115 0.015 0.000 0.015 0.000 {built-in method marshal.loads}
19 0.008 0.000 0.010 0.001 {built-in method _imp.create_dynamic}
238/237 0.004 0.000 0.006 0.000 {built-in method builtins.__build_class__}

10 000 calls, and even though each call takes longer, you obtain a numpy.ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists. There are two significant differences between them.

  • Python lists are dynamic, so you can append and remove elements for instance. NumPy arrays have fixed length, so you cannot add or delete without creating a new one. (And creating an array is costly.)
  • Python lists can hold several datatypes at the same time, while a NumPy array can only contain one.


So, they are less flexible but significantly more performant. When this additional flexibility is not needed, NumPy outperforms Python.

Where is the break-even point?

To see exactly at which size does NumPy overtakes Python in random number generation, we can compare the two by measuring the execution times with respect to several sizes.

We can see that around 200, NumPy starts to overtake Python. Of course, this number might be different for other operations like calculating the sine or adding numbers together, but the tendency will be same. Python will slightly outperform NumPy for small input sizes, but as size grows, NumPy wins by a large margin.

Summary

NumPy can provide significant performance improvement when used right. However, for certain situations, Python can be a better choice. If you are not aware when to use NumPy, you might end up hurting your performance.

In general, you might be better off with plain old Python if for example

  • you work with small lists,
  • you want to frequently add/remove from the list.

When optimizing for performance, always think about how things work on the inside. This way, you can really supercharge your code, even in Python.

Share on facebook
Share on twitter
Share on linkedin

Related posts