Search code examples
pythondataframemultiplication

faster column-multiply in dataframe


I have a pandas dataframe A that has 2 columns x and y. I want to multiply them like B = A['x'] * A['y']. Is there any faster way to do this? would A['a'].mul(A['y']) be faster?


Solution

  • To check which is faster you can check the time that it takes for each case: In Ipython or Jupiter would be:

    %%timeit
        d['a'] * d['b']
    

    For a dataframe like this one:

    a = np.arange(0,10000)
    b = np.ones(10000)
    
    d = pd.DataFrame(np.vstack([a,b]).T, columns=["a","b"])
    

    Get your multiplication:

    1- in pandas

    d['a'] * d['b']
    81.2 µs ± 977 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    2 - in numpy. avoiding pandas overhead

    d['a'].values * d['b'].values
    9.21 µs ± 41.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    ... If you are worried so much about speed, use just numpy. Take advantage of the nice feature of pandas to allow you to access the array with the feature values.