I am trying to use numba to reduce the runtime for 250k rows that can be done using df.loc, because when I use df.loc it takes a lot of time when running recursive inputs and outputs.
Here is my input df
input
1
2
3
4
and desired output 'a' and 'b'
inputa a b
1 0 0
2 2 3
3 5 7
4 9 12
Basically, 'a' and 'b' initial values are 0.
a = previous value of 'a' + current value of inputa
While b = previous value of inputa + a current values
.
My current code is this.
@jit(nopython=True)
def foo(inputa):
a = np.zeros(inputa.shape)
b = np.zeros(inputa.shape)
a[0] = 0
b[0] = 0
for i in range(1, a.shape[0]) :
a[i] = a[i-1] + inputa[i]
b[i] = inputa[i-1] + a[i]
return a, b
df[['a','b']] = foo(df['inputa'].values)
print(df)
However, I encountered this error
TypingError: cannot determine Numba type of <class 'method'>
If I am working with 1 column output for numba, the code is working fine. See code below.
@jit(nopython=True)
def foo(inputa):
a = np.zeros(inputa.shape)
#b = np.zeros(inputa.shape)
a[0] = 0
#b[0] = 0
print(a[x])
for i in range(1, a.shape[0]) :
a[i] = a[i-1] + inputa[i]
#b[i] = inputa[i-1] + a[i]
return a #, b
#df[['a','b']] = foo(df['inputa'].values)
df['a'] = foo(df['inputa'].values)
print(df)
However, if I tried to do 2-column outputs/results, I am having a problem. Basically, I want to do multiple recursive columns w/o using df.loc.
Please advise.
Thanks a lot.
Your method is just fine, except there is a typo in b[i] = input[i-1] + a[i]
, as input
as rightly mentioned by @Jérôme Richard is a built-in method in python,
import numba as nb
import numpy as np
import pandas as pd
# DataFrame
df = pd.DataFrame(np.arange(1,101),columns=['data'])
df_vals = df.values.ravel() # .ravel to unfold a 2d array into 1d array
# Regular Python Function
def foo(arr):
arrlen = arr.shape[0]
a = np.zeros(arrlen)
b = np.zeros(arrlen)
for i in range(1, a.shape[0]) :
a[i] = a[i-1] + arr[i]
b[i] = arr[i-1] + a[i]
return a, b
# Jitted (nopython) Function
foo_nb = nb.njit()(foo)
# Numba Warmup
_ = foo_nb(df_vals)
a_vals, b_vals = foo_nb(df_vals)
df['a'] = a_vals
df['b'] = b_vals
# Performance Benchmarks *NOTE : To be used in Jupyter Notebook.
print('Non Numba Code Performance : ')
foo_timeit = %timeit -o -n 1000 foo(df_vals)
print('\nNumba Code Performance : ')
foo_nb_timeit = %timeit -o -n 1000 foo_nb(df_vals)
Output :
Non Numba Code Performance :
289 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numba Code Performance :
1.22 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)