Search code examples
pythonpandasmachine-learningpandas-apply

Panadas Condition on Dataframe returns TypeError: '>' not supported between instances of 'str' and 'int'


I'm working on a DataFrame using pandas and I need to add a new column based on some conditions.

My DataFrame is:

discount   tax   total   subtotal   productid
  3         0     20       13        002
  10        3     106      94        003
  46.49     6     21       20        004

I need to apply some conditions while adding a new column named as Class to the DataFrame.

Conditions are as follows: IF discount > 20 & total > 100 & tax == 0 then Class should be 1 otherwise it should be 0

Here's how I have tried:

def conditions(s):
    if (s['discount'] > 20) and (s['tax'] == 0) and (s['total'] > 100):
        return 1
    else:
        return 0

df_full['Class'] = df_full.apply(conditions, axis=1)

But it returns an error as:

TypeError: ("'>' not supported between instances of 'str' and 'int'", 'occurred at index 18')

How can I solve this issue?

help me, please!

Thanks in advance!


Solution

  • I suggest create boolean mask and cast to int, Trues are 1s and Falses are 0s, also change and to & for bitwise AND:

    print (df_full)
       discount  tax  total  subtotal productid
    0      3.00    0     20        13       002
    1     40.00    0    106        94       003
    2     46.49    6     21        20       004
    

    You can also check all non numeric values:

    print(df_full[pd.to_numeric(df_full['discount'], errors='coerce').isnull()]
    
    #for convert to numeric - non numeric are convert to `NaN`s
    df_full['discount'] = pd.to_numeric(df_full['discount'], errors='coerce')
    

    df_full['Class'] = ((df_full['discount'] > 20) & 
                        (df_full['tax'] == 0) & 
                        (df_full['total'] > 100)).astype(int)
    print (df_full)
       discount  tax  total  subtotal productid  Class
    0      3.00    0     20        13       002      0
    1     40.00    0    106        94       003      1
    2     46.49    6     21        20       004      0