Python Performance: A Comparison

QuantDare

Contributor:
QuantDare
Visit: QuantDare

By:

QuantDare Blog https://quantdare.com/author/jgonzalezomega/

This article was first posted on QuantDare Blog.

When coding in any computer language, performance is always an important feature to take into consideration. But if it comes to Python, this factor becomes crucial. In this post, we will see how the way we develop a function and whether we’re using a library or not can make dramatical changes regarding performance.

Let’s look at two possible implementations of a simple function, which apply different transformations according to the input values:

def some_calcs1(x):

if x > 0.04:
return 0.4
elif x > 0.01:
return 10 * x
elif x > 0:
return 10 * x / 3.0 + 1 / 15.0
else:
return 0

This second function would obtain the same result as the previous one:

def some_calcs2(x):

return (x > 0.04) * 0.4 + \
((x > 0.01) & (x <= 0.04)) * x * 10.0 + \ ((x > 0) & (x <= 0.01)) * (x * 10.0 / 3.0 + 1.0 / 15.0) + \
0.0

Let’s define 3 equivalent variables in different types: a list of lists, a Numpy array and a DataFrame from Pandas:

import pandas as pd
import numpy as np

xarray = np.random.rand(1000,10)
xlist = xarray.tolist
xdf = pd.DataFrame(xarray)

Lists                                                           

If you apply the two functions defined above to a list of lists, the first function is 5 times faster than the second, just as a consequence of the way they are coded. In function 1 only the code inside the fulfilled condition is executed, while in function 2 all the calculations are done for every figure.

%timeit z1 = [list(map(some_calcs1, z)) for z in xlist]
%timeit z2 = [list(map(some_calcs2, z)) for z in xlist]

Numpy arrays

But, what if we worked with Numpy arrays instead of lists? Can we expect the same behaviour?

First of all, in order to “map” the first function in Numpy we would need to vectorize it (we will use a decorator), otherwise it would not work. Vectorize a function allows us to apply the function to the whole array, instead of using a loop.

import numpy as np

@np.vectorize
def some_calcs1_vec(x):
if x > 0.04:
return 0.4
elif x > 0.01:
return 10 * x
elif x > 0:
return 10 * x / 3.0 + 1 / 15.0
else:
return 0

At first sight, we realise that the performance has improved a little with the first function, and tremendously with the second one. But what amazes the most is that now the second function is much faster than the first one! But, were not we saying that the first implementation was faster? Let’s explain what is going on here:

%timeit z1 = some_calcs1_vec(xarray)
%timeit z2 = some_calcs2(xarray)

Numpy has what they call the universal functions (ufunc), which are functions that can receive array like inputs and return array output, but they operate over each element. It is quite the same that we do when vectorizing, but with faster results, since these functions look over the elements by loops in a lower level (C implementations). Besides, these functions broadcast (adjust) the input arrays when they have different dimensions.

Then, the first function is “generalized” to operate like the ufuncs, but the second function does use the ufuncs.

Alright, but I don’t see any ufunc at all! Well, the different operators you see in the formulas, like *, +, &, > are overloaded with the ufuncs multiply(), add(), logical_and() or greater(). Then, for example, x+4.0 would be the same as applying np.add(x, 4.0).

Pandas DataFrame

To conclude, we might wonder if Pandas library would obtain similar performance as Numpy, taking into account that Pandas makes use of Numpy arrays underneath.

If we apply a map operation with the functions defined, the performance would be slower than the one obtained by mapping a list and, obviously, much slower than Numpy:

%timeit z1 = xdf.applymap(some_calcs1)
%timeit z2 = xdf.applymap(some_calcs2) If we apply the functions directly over Pandas DataFrames, the first vectorized function performs slower than with a Numpy array; the second function loses all its potential when applying over a DataFrame, with a performance similar to the one obtained with a list.

%timeit z1 = some_calcs1_vec(xdf)
%timeit z2 = some_calcs2(xdf)

Finally, to complicate matters even further, we could use the “apply ” DataFrame method which applies the function specified to entire rows or columns; as we see below, the choice of the axis to operate on is a factor that makes a big difference in terms of performance.

In general, it is advisable not to use this kind of mapping in Pandas if you want an acceptable performance.

%timeit z1 = xdf.apply(some_calcs1_vec, axis=0)
%timeit z2 = xdf.apply(some_calcs1_vec, axis=1)

So, be warned: the way you implement your code and the choice of the right libraries and functions can make your programs fly or be as slow as molasses in January.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantDare and is being posted with permission from QuantDare. The views expressed in this material are solely those of the author and/or QuantDare and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.