Search code examples
pythonpandasstringconditional-statements

Python : 2 (or more) conditional strings in a groupby agg


Here is my issue : In my Pandas dataframe "Dataset", I would like to get the following result :

  • when the column "Product" has the value "Ref_01" at least once, I want to have "XY" as a result

  • but when the same column has at least once the value "Other_Ref", I also want to have "ABC" as a result

==> Therefore, if for one given grouped by "Number", we have at least one "Ref_01" and at least one "Other_Ref", I would like to have the following result : "XY, ABC".

However, with the following code, the "ABC" replaces the "XY", instead of completing it :

Result = Dataset.groupby(['Number']).agg(Data_Type=('Product',
                                                    lambda x: "XY" if (x=='Ref_01').sum() > 0 else "" \
        + "ABC" if (x=='Other_Ref').sum() > 0 else "" )).reset_index()

Thank you for your help.


Solution

  • How you are using the lambda function means you are concatenating with an empty string and the condition overwrites whatever value was originally there. To achieve what you are trying to do, you need to change your code so the values are concatenated rather than overwriting each other when both conditions are true.

    Try this code

    Result = Dataset.groupby(['Number']).agg(Data_Type=('Product', lambda x: "XY" * (x == 'Ref_01').any() + ", ABC" * (x == 'Other_Ref').any())).reset_index()
    
    

    Using any() checks if there are any instances of 'Ref_01' or 'Other_Ref' in the group, then the string multiplication gives either an empty string or the string you want made up of "XY, ABC".