I'm having trouble with preprocessing data. I have data that looks like
I want to group by a field that denotes the user called Account Number, and I want to create a new field that is a concatenation of all values of Customer Event Type for each Account Number.
I've tried this:
df_by_accnum = df.groupby('Account Number')[['Customer Event Type']].agg(','.join).reset_index()
but it results in all the column names being concatenated instead of hte values (https://i.sstatic.net/Um0aU.png)
Could I get some help trouble shooting this? Thank you
By try and error I found out that it might be due the type of the column Customer Event Type. I guess it might hold non string values, which join assumes, and for some reason the column names are used instead.
Try creating a new column converting Customer Event Type to strings and use that one:
>>> d = {'Account Number': [1, 2, 3, 1], 'Customer Event Type': [1, 1, 2, 2]}
>>> df = pd.DataFrame(data=d)
>>> df['Customer Event Type str'] = df['Customer Event Type'].astype(str)
>>> df.groupby('Account Number')[['Customer Event Type str']].agg(','.join).reset_index()
Account Number Customer Event Type str
0 1 1,2
1 2 1
2 3 2
Whereas using the numeric column produces:
>>> df.groupby('Account Number')[['Customer Event Type']].agg(','.join).reset_index()
Account Number Customer Event Type
0 1 Account Number,Customer Event Type,Customer Ev...
1 2 Account Number,Customer Event Type,Customer Ev...
2 3 Account Number,Customer Event Type,Customer Ev...
Hope it helps. Let us know if you find out the reason for this behavior. Thanks!