I have the following code:
df= pd.DataFrame(data=all_r_1.to_dataframe().groupby(['user_id'])['type'].sum()).reset_index()
userid | type
20 | aab
21 | ababb
To remove the duplicates from the strings in the type
column, I have this code:
df['type'] = df['type'].apply(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))
which produces this:
userid | type
20 | ab
21 | abab
This is the input df:
id | userid | type
1 | 20 | a
2 | 20 | a
3 | 20 | b
4 | 21 | a
5 | 21 | b
6 | 21 | a
7 | 21 | b
8 | 21 | b
However, what I want to do is to include the counts for each character while removing the duplicates:
userid | type
20 | a2b
21 | abab2
Any ideas how I can modify the itertools.groupby
code to also include the counts?
itertools.groupby
stores the actual groups so you can access this as follows:
df['type'] = df['type'].apply(lambda x: ''.join('{}{}'.format(ch,len(list(group))) for ch, group in itertools.groupby(x)))