Search code examples
pythonpandasnumpyvectorizationnumpy-ndarray

numpy vectorize np.prod Cannot construct a ufunc with more than 32 operands


I know there is a similar question here: Python numpy.vectorize: ValueError: Cannot construct a ufunc with more than 32 operands

But my case is different.

I have a df with 32 columns ,you can have it by running following code:

import numpy as np
import pandas as pd
from io import StringIO
dfs = """
    M0  M1  M2  M3 M4  M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30  age 
1   1   2   3    4  5   6  1  2 3    4  5  6   1   2    3  4  5    6   7   8    9 1    2  3    4  5    6  1    2   3    4   3.2        
2   7   5   4    5  8   3  1  2 3    4  5  6   1   2    3  4  5    6   7   8    9 1    2  3    4  5    6  1    2   3    4   4.5
3   4   8   9    3  5   2  1  2 3    4  5  6   1   2    3  4  5    6   7   8    9 1    2  3    4  5    6  1    2   3    4   6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+', )
df

based on business logic I built a vectorized function, and if the total number of the parameters of function is less than 32 it works fine:

M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
       "M20","M21","M22","M23","M24","M25","M26","M27","M28","M29"]
    
    def func2(df, M):
        return [df[i].values for i in M] 
    
    def func(age,*Ms):
        newcol=np.prod(Ms[0:age])
        return newcol
    
    vfunc = np.frompyfunc(func, len(M)+1, 1)
    
    df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

For easy understanding,func2 is just make the code more clean,it generates all the parameters for func,without func2 the code will looks like:

def func(age,M0,M1,M2,...,M29):
    newcol=np.prod(Ms[0:age])
    return newcol

vfunc = np.frompyfunc(func, 31, 1)

df['newcol']=vfunc(df['age'].values.astype(int), df['M1'].values,...,df['M29'].values)

The real problem is once the number of parameters is equal or larger than 32 like this:

M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
           "M20","M21","M22","M23","M24","M25","M26","M27","M28","M29","M30"] # M30 is the only difference from the above function
        
        def func2(df, M):
            return [df[i].values for i in M] 
        
        def func(age,*Ms):
            newcol=np.prod(Ms[0:age])
            return newcol
        
        vfunc = np.frompyfunc(func, len(M)+1, 1)
        
        df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

I received error:

ValueError                                Traceback (most recent call last)
<ipython-input-66-9a042ad44f9b> in <module>()
     76     return newcol
     77 
---> 78 vfunc = np.frompyfunc(func, len(M)+1, 1)
     79 
     80 df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

ValueError: Cannot construct a ufunc with more than 32 operands (requested number were: inputs = 32 and outputs = 1)

In my real business logic I have more than 100 columns need use np.pro to calculate, so this really stuck me. Any friend can help?


Solution

  • Here is a way to achieve your result. Select all the M columns with filter, use where to replace by nan all the values that the column position is higher than the age column, then prod along the columns.

    df['newcol'] = (
         # keep only Mx columns
        df.filter(like='M')
          # keep only the values when the position of the column
          # is less than the age
          .where(lambda x: (np.arange(x.shape[1])+1)<df['age'].to_numpy()[:, None])
          # multiply all the non-nan values per row
          .prod(axis=1)
    )
    print(df)