Search code examples
pythonstringpandaspython-itertools

How to include the counts for each character while removing the duplicates using itertools.groupby


I have the following code:

df= pd.DataFrame(data=all_r_1.to_dataframe().groupby(['user_id'])['type'].sum()).reset_index()

userid | type
20     | aab
21     | ababb

To remove the duplicates from the strings in the type column, I have this code:

df['type'] = df['type'].apply(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))

which produces this:

userid | type
20     | ab
21     | abab

This is the input df:

id | userid | type 
1  | 20     | a  
2  | 20     | a
3  | 20     | b
4  | 21     | a  
5  | 21     | b
6  | 21     | a
7  | 21     | b
8  | 21     | b

However, what I want to do is to include the counts for each character while removing the duplicates:

userid | type
20     | a2b
21     | abab2

Any ideas how I can modify the itertools.groupby code to also include the counts?


Solution

  • itertools.groupby stores the actual groups so you can access this as follows:

    df['type'] = df['type'].apply(lambda x: ''.join('{}{}'.format(ch,len(list(group))) for ch, group in itertools.groupby(x)))