Search code examples
pythonpandasnumpyjitnumba

What the best way to get structured array / dataframe like structures in Numba?


I have a numpy array that I reference by column, e.g., df['x'], df['y'].

What is the best way to give this to Numba so I can run the function in nopython mode?

Or what is the best way to deal with dataframes in Numba, so I can access a column by name?


Solution

  • Supply 1d arrays as arguments

    numba is designed to work with NumPy arrays directly. So you should not look to feed a dataframe or structured array to a numba function. You can feed the arrays as separate arguments. For example:

    from numba import njit
    
    @njit
    def func(A, B):
        # some logic
        arr = A + B
        return arr
    
    df['z'] = func(df['x'].values, df['y'].values)
    

    Unpack 2d array within numba function

    This is a special case where your dataframe series all have the same type. Check df.dtypes if you are not sure of your series types. You can feed a single array and perform the unpacking within numba:

    @njit
    def func(df_values):
        A, B = df_values[:, 0], df_values[:, 1]
        # some logic
        arr = A + B
        return arr
    
    df['z'] = func(df[['x', 'y']].values)