python pandas dataframe group-by preprocessor

Using groupby for user id and combining strings

I'm having trouble with preprocessing data. I have data that looks like

I want to group by a field that denotes the user called Account Number, and I want to create a new field that is a concatenation of all values of Customer Event Type for each Account Number.

I've tried this:

df_by_accnum = df.groupby('Account Number')[['Customer Event Type']].agg(','.join).reset_index()

but it results in all the column names being concatenated instead of hte values (https://i.sstatic.net/Um0aU.png)

Could I get some help trouble shooting this? Thank you

Solution

By try and error I found out that it might be due the type of the column Customer Event Type. I guess it might hold non string values, which join assumes, and for some reason the column names are used instead.

Try creating a new column converting Customer Event Type to strings and use that one:

>>> d = {'Account Number': [1, 2, 3, 1], 'Customer Event Type': [1, 1, 2, 2]}
>>> df = pd.DataFrame(data=d)
>>> df['Customer Event Type str'] = df['Customer Event Type'].astype(str)
>>> df.groupby('Account Number')[['Customer Event Type str']].agg(','.join).reset_index()
   Account Number Customer Event Type str
0               1                     1,2
1               2                       1
2               3                       2

Whereas using the numeric column produces:

>>> df.groupby('Account Number')[['Customer Event Type']].agg(','.join).reset_index()
   Account Number                                Customer Event Type
0               1  Account Number,Customer Event Type,Customer Ev...
1               2  Account Number,Customer Event Type,Customer Ev...
2               3  Account Number,Customer Event Type,Customer Ev...

Hope it helps. Let us know if you find out the reason for this behavior. Thanks!