Is it possible to have 2 columns output using numba and use it as df 2 new columns

I am trying to use numba to reduce the runtime for 250k rows that can be done using df.loc, because when I use df.loc it takes a lot of time when running recursive inputs and outputs.

Here is my input df

input
1
2
3
4

and desired output 'a' and 'b'

inputa  a    b
1       0    0
2       2    3
3       5    7
4       9    12

Basically, 'a' and 'b' initial values are 0.

a = previous value of 'a' + current value of inputa

While b = previous value of inputa + a current values.

My current code is this.

@jit(nopython=True)
def foo(inputa):
    a = np.zeros(inputa.shape)
    b = np.zeros(inputa.shape)
    a[0] = 0
    b[0] = 0
    for i in range(1, a.shape[0]) :
        a[i] = a[i-1] + inputa[i]
        b[i] = inputa[i-1] + a[i]
    return a, b
 
df[['a','b']] = foo(df['inputa'].values)     
print(df)

However, I encountered this error

TypingError: cannot determine Numba type of <class 'method'>

If I am working with 1 column output for numba, the code is working fine. See code below.

@jit(nopython=True)
def foo(inputa):
    a = np.zeros(inputa.shape)
    #b = np.zeros(inputa.shape)
    a[0] = 0
    #b[0] = 0
    print(a[x])
    for i in range(1, a.shape[0]) :
        a[i] = a[i-1] + inputa[i]
        #b[i] = inputa[i-1] + a[i]
    return a #, b
 
#df[['a','b']] = foo(df['inputa'].values)
df['a'] = foo(df['inputa'].values)

print(df)

However, if I tried to do 2-column outputs/results, I am having a problem. Basically, I want to do multiple recursive columns w/o using df.loc.

Please advise.

Thanks a lot.

Solution

Your method is just fine, except there is a typo in b[i] = input[i-1] + a[i], as input as rightly mentioned by @Jérôme Richard is a built-in method in python,

import numba as nb
import numpy as np
import pandas as pd

# DataFrame
df = pd.DataFrame(np.arange(1,101),columns=['data'])
df_vals = df.values.ravel() # .ravel to unfold a 2d array into 1d array


# Regular Python Function
def foo(arr):
    arrlen = arr.shape[0]
    a = np.zeros(arrlen)
    b = np.zeros(arrlen)
    for i in range(1, a.shape[0]) :
        a[i] = a[i-1] + arr[i]
        b[i] = arr[i-1] + a[i]
    return a, b

# Jitted (nopython) Function
foo_nb = nb.njit()(foo)


# Numba Warmup
_ = foo_nb(df_vals)


a_vals, b_vals = foo_nb(df_vals)

df['a'] = a_vals
df['b'] = b_vals

# Performance Benchmarks *NOTE : To be used in Jupyter Notebook.
print('Non Numba Code Performance : ')
foo_timeit = %timeit -o -n 1000 foo(df_vals)
print('\nNumba Code Performance : ')
foo_nb_timeit = %timeit -o -n 1000 foo_nb(df_vals)

Output :

Non Numba Code Performance : 
289 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Numba Code Performance : 
1.22 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)