Search code examples
pythonpandasfunctionnumpynumba

Is it possible to have 2 columns output using numba and use it as df 2 new columns


I am trying to use numba to reduce the runtime for 250k rows that can be done using df.loc, because when I use df.loc it takes a lot of time when running recursive inputs and outputs.

Here is my input df

input
1
2
3
4

and desired output 'a' and 'b'

inputa  a    b
1       0    0
2       2    3
3       5    7
4       9    12

Basically, 'a' and 'b' initial values are 0.

a = previous value of 'a' + current value of inputa 

While b = previous value of inputa + a current values.

My current code is this.

@jit(nopython=True)
def foo(inputa):
    a = np.zeros(inputa.shape)
    b = np.zeros(inputa.shape)
    a[0] = 0
    b[0] = 0
    for i in range(1, a.shape[0]) :
        a[i] = a[i-1] + inputa[i]
        b[i] = inputa[i-1] + a[i]
    return a, b
 
df[['a','b']] = foo(df['inputa'].values)     
print(df)

However, I encountered this error

TypingError: cannot determine Numba type of <class 'method'>

If I am working with 1 column output for numba, the code is working fine. See code below.

@jit(nopython=True)
def foo(inputa):
    a = np.zeros(inputa.shape)
    #b = np.zeros(inputa.shape)
    a[0] = 0
    #b[0] = 0
    print(a[x])
    for i in range(1, a.shape[0]) :
        a[i] = a[i-1] + inputa[i]
        #b[i] = inputa[i-1] + a[i]
    return a #, b
 
#df[['a','b']] = foo(df['inputa'].values)
df['a'] = foo(df['inputa'].values)

print(df)

However, if I tried to do 2-column outputs/results, I am having a problem. Basically, I want to do multiple recursive columns w/o using df.loc.

Please advise.

Thanks a lot.


Solution

  • Your method is just fine, except there is a typo in b[i] = input[i-1] + a[i], as input as rightly mentioned by @Jérôme Richard is a built-in method in python,

    import numba as nb
    import numpy as np
    import pandas as pd
    
    # DataFrame
    df = pd.DataFrame(np.arange(1,101),columns=['data'])
    df_vals = df.values.ravel() # .ravel to unfold a 2d array into 1d array
    
    
    # Regular Python Function
    def foo(arr):
        arrlen = arr.shape[0]
        a = np.zeros(arrlen)
        b = np.zeros(arrlen)
        for i in range(1, a.shape[0]) :
            a[i] = a[i-1] + arr[i]
            b[i] = arr[i-1] + a[i]
        return a, b
    
    # Jitted (nopython) Function
    foo_nb = nb.njit()(foo)
    
    
    # Numba Warmup
    _ = foo_nb(df_vals)
    
    
    a_vals, b_vals = foo_nb(df_vals)
    
    df['a'] = a_vals
    df['b'] = b_vals
    
    # Performance Benchmarks *NOTE : To be used in Jupyter Notebook.
    print('Non Numba Code Performance : ')
    foo_timeit = %timeit -o -n 1000 foo(df_vals)
    print('\nNumba Code Performance : ')
    foo_nb_timeit = %timeit -o -n 1000 foo_nb(df_vals)
    

    Output :

    Non Numba Code Performance : 
    289 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    Numba Code Performance : 
    1.22 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)