Search code examples
pythonpandasdataframegroup-byvalueerror

comparing values of a column with a variable and make new column


I have a dataframe like this :

           Patch  Last reward  First reward  Difference    Name  Block_No.
group_id                                                                 
1             3          0.0           0.0         0.0  XYZ          1
2             4         43.0          54.0        11.0  XYZ          1
3             5          0.0           0.0         0.0  XYZ          2
4             6         40.0          65.0        25.0  XYZ          2
5             7          0.0           0.0         0.0  XYZ          3
6             0          0.0           0.0         0.0  XYZ          3

I want to create a new column called 'Rep_rate' based on the following condition: if block_no. = 1 and if patch = 3 , then Rep_rate = 4 , else Rep_rate = 0.

I tried doing this :

if (df_last['Block_No.']) == 1:
            for i in range (len(df_last)):
                if df_last['Patch'][i] == 1: 
                    rep = 8
                else:
                    rep = 0
                df_last['Rep_Rate'] = rep

if (df_last['Block_No.']) == 2:
                for i in range (len(df_last)):
                    if df_last['Patch'][i] == 1: 
                        rep = 4
                    else:
                        rep = 0
                    df_last['Rep_Rate'] = rep

 if (df_last['Block_No.']) == 3:
                for i in range (len(df_last)):
                    if df_last['Patch'][i] == 1: 
                        rep = 8
                    else:
                        rep = 0                            
                    df_last['Rep_Rate'] = rep

However when i try this i get the following error :

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


   

Solution

  • I don't understand the logic you are using to populate the RR column, but one approach might be:

    df['RR'] = 0
    if (df['Block'] == 1).all():
        for i, row in df.iterrows():
            if row['Patch'] == 3:
                df.loc[i,'RR'] = 4 # note using df.loc to directly edit the dataframe
    elif (df['Block'] == x).all():
        for i, row in df.iterrows():
            if row['Patch'] == y:
                df.loc[i,'RR'] = z
    # more if statements as needed
    

    Replace x, y, and z with whatever values you need.

    The issue you are having is with if df["Block"] == 1. The code df["Block"] == 1 results in a boolean Series of True/False for whether or not each value in the Series is equal to 1. Using an if statement against a Series is not supported because the meaning is ambiguous. Pandas provides ser.any() and ser.all() for unambiguous boolean evaluation. EG

    if (df["Block"] ==1).all():
        # more code ...
    

    would evaluate the code in the if statement if all of the values in the Block column were equal to 1.