Search code examples
pythonpandasgroup-by

Counting the unique values in a data frame and then appending the value in front of the string when grouped


I have the following data frame:

Name id Model
Alice alice_1 (A_01), (A_02)
Bob bob_1 (B_01)
Alice alice_2 (A_01), (A_05)
Alice alice_3 (A_01), (A_05)
Bob bob_2 (B_01)
Bob bob_3 (B_01)

I would like to count the unique model values inside the brackets and append the count in front of the bracket like that:

Name Model
Alice 3x (A_01), 2x (A_05), 1x (A_02)
Bob 3x(B_01)

I tried to use different approaches with group by and aggregate functions but could not find a way. Also I can use value counts and count each Model but then I don't know how to append the resulting number the the whole data frame.


Solution

  • Use Series.str.split with DataFrame.explode for new rows by joinjed values by , , then get counts by GroupBy.size, sorting and add to Model column, last aggregate join:

    df = (df.assign(Model = df['Model'].str.split(', '))
              .explode('Model')
              .groupby(['Name','Model'])
              .size()
              .sort_values(ascending=False)
              .astype(str)
              .add('x')
              .reset_index(level=1)
              .assign(Model = lambda x: x[0].str.cat(x['Model']))
              .groupby('Name')['Model']
              .agg(', '.join)
              .reset_index())
    print (df)
         Name                         Model
    0  Alice   3x(A_01), 2x(A_05), 1x(A_02)
    1    Bob                       3x(B_01)