Morining to all, I have a 460.000 rows DataFrame, with 15 columns. I'm trying to assign to one column the product of another two the code is like this
df[df.colx == 'S']['prd'] = df['col1']*df['col2']
prd, col1 and col2 have float64 as data type. I executed a lot of operations on other columns with no problem, including date difference, and they are almost instantly executed. if I try
df['prd'] = df['col1']*df['col2']
the execution is super fast. the problem raises when I try to apply the operation on a subset of the DataFrame Someone can help me and explain how I can lower the execution time? Thank you very much!
UPDATE: if if do
df2 = pd.DataFrame(df[df.colx=='S'])
and then
df2['prd'] = df['col1']*df['col2']
is still super slow......... oh is it possible? df2 should be a new DataFrame.......
Try to seperate the operations:
df2 = df[df.colx == 'S']
df2['prd'] = df2['col1]*df2['col2']
or if the df.colx == 'S'
is some condition for you, you can run:
df['prd'] = numpy.where(df['prod'] == 'S', df['col1']*df['col2'], 'Do something else')
just replace Do something else with another logical opartion which should be done if df.colx != 'S'