Search code examples
pythonpandasdataframetransform

Pandas List All Unique Values Based On Groupby


I have a dataframe that has worksite info.

District#    Site#           Address
        1        1    123 Bayview Ln
        1        2    456 Example St
        2       36      789 Hello Dr
        2       44      789 Hello Dr

I am trying to transform this dataframe to add a column with the highest Site# as well as the distinct addresses when I group by District#. Here is an example of what I want the output to look like:

District#    Site#           Address    MaxSite#            All District Addresses
        1        1    123 Bayview Ln           2     123 Bayview Ln,456 Example St
        1        2    456 Example St           2     123 Bayview Ln,456 Example St
        2       36      789 Hello Dr          44                      789 Hello Dr
        2       44      789 Hello Dr          44                      789 Hello Dr

I am able to get the Max Site# by doing

df['MaxSite#'] = df.groupby(by='District#')['Site#'].transform('max')

But I am trying to find a similar way to list all of the unique addresses when I groupby District#.

I have tried doing .transform('unique') but that is not a valid function name and doing .agg(['unique']) returns dimensions that do not match


Solution

  • You can use groupby and agg to get the Max Site Number and List all the addresses

    Then merge back to the original dataframe:

    grouped_df = df.groupby('District#').agg(Max_Site_Num=('Site#', 'max'), 
          All_District_Addresses=('Address', lambda x: list(set(x))).reset_index()
    
    df = df.merge(grouped_df,on='District#')
    

    Output:

         District#  Site#         Address  Max_Site_Num            All_District_Addresses
    0          1      1  123 Bayview Ln             2  [123 Bayview Ln, 456 Example St]
    1          1      2  456 Example St             2  [123 Bayview Ln, 456 Example St]
    2          2     36    789 Hello Dr            44                    [789 Hello Dr]
    3          2     44    789 Hello Dr            44                    [789 Hello Dr]