Search code examples
pandasdataframegroup-bywhitespacecase-insensitive

Pandas Dataframe groupby on 3 columns and make one column lowercase


I have a dataframe:

   country rating         owner
0  England      a    John Smith
1  England      b    John Smith
2   France      a     Frank Foo
3   France      a     Frank foo
4   France      a  Frank    Foo
5   France      b     Frank Foo

I'd like to produce a count of owners after grouping by country and rating and

  • ignoring case
  • gnoring any spaces ( leading, trailing or inbetween)

I am expecting:

   country rating       owner  count
0  England      a  John Smith      1
1  England      b  John Smith      1
2   France      a   Frank Foo      3
3   France      b   Frank Foo      1

I have tried:

df.group_by(['rating','owner'])['owner'].count()

and

df.group_by(['rating','owner'].str.lower())['owner'].count()

Solution

  • Use title and replace to rework the string and groupby.size to aggregate:

    out = (df.groupby(['country', 'rating',
                       df['owner'].str.title().str.replace(r'\s+', ' ', regex=True)])
           .size().reset_index(name='count')
          )
    

    Output:

       country rating       owner  count
    0  England      a  John Smith      1
    1  England      b  John Smith      1
    2   France      a   Frank Foo      3
    3   France      b   Frank Foo      1