I have a numpy array that I reference by column, e.g., df['x']
, df['y']
.
What is the best way to give this to Numba so I can run the function in nopython
mode?
Or what is the best way to deal with dataframes in Numba, so I can access a column by name?
numba
is designed to work with NumPy arrays directly. So you should not look to feed a dataframe or structured array to a numba
function. You can feed the arrays as separate arguments. For example:
from numba import njit
@njit
def func(A, B):
# some logic
arr = A + B
return arr
df['z'] = func(df['x'].values, df['y'].values)
numba
functionThis is a special case where your dataframe series all have the same type. Check df.dtypes
if you are not sure of your series types. You can feed a single array and perform the unpacking within numba
:
@njit
def func(df_values):
A, B = df_values[:, 0], df_values[:, 1]
# some logic
arr = A + B
return arr
df['z'] = func(df[['x', 'y']].values)