Search code examples
pythonpandasoptimizationfloating-accuracy

multiplying float columns in pandas takes too much time


Morining to all, I have a 460.000 rows DataFrame, with 15 columns. I'm trying to assign to one column the product of another two the code is like this

df[df.colx == 'S']['prd'] = df['col1']*df['col2']

prd, col1 and col2 have float64 as data type. I executed a lot of operations on other columns with no problem, including date difference, and they are almost instantly executed. if I try

df['prd'] =  df['col1']*df['col2']

the execution is super fast. the problem raises when I try to apply the operation on a subset of the DataFrame Someone can help me and explain how I can lower the execution time? Thank you very much!

UPDATE: if if do

df2 = pd.DataFrame(df[df.colx=='S'])

and then

df2['prd'] =  df['col1']*df['col2']

is still super slow......... oh is it possible? df2 should be a new DataFrame.......


Solution

  • Try to seperate the operations:

    df2 = df[df.colx == 'S']
    df2['prd'] = df2['col1]*df2['col2']
    

    or if the df.colx == 'S'is some condition for you, you can run:

    df['prd'] = numpy.where(df['prod'] == 'S', df['col1']*df['col2'], 'Do something else')
    

    just replace Do something else with another logical opartion which should be done if df.colx != 'S'