I am trying to create a new column in this data frame. The data set has multiple records for each PERSON because each record is a different account. The new column values should be a combination of the values for each PERSON in the TYPE column. For example, if John Doe has four accounts the value next to his nae in the new column should be a concatenation of the values in TYPE. An example of the final data frame is below. Thanks in advance.
You can do this in two lines (first code, then explanation):
Code:
in: name_types = df.pivot_table(index='Name', values='AccountType', aggfunc=set)
out:
AccountType
Name
Jane Doe {D}
John Doe {L, W, D}
Larry Wild {L, D}
Patti Shortcake {L, W}
in: df['ClientType'] = df['Name'].apply(lambda x: name_types.loc[x]['AccountType'])
Explanation:
The pivot table gets all the AccountTypes for each individual name, and removes all duplicates using the 'set' aggregate function.
The apply function then iterates through each 'Name' in the main data frame, looks up the AccountType associated with each in name_typed, and adds it to the new column ClientType in the main dataframe.
And you're done!
Addendum: If you need the column to be a string instead of a set, use:
in: def to_string(the_set):
string = ''
for item in the_set:
string += item
return string
in: df['ClientType'] = df['ClientType'].apply(to_string)
in: df.head()
out:
Name AccountType ClientType
0 Jane Doe D D
1 John Doe D LDW
2 John Doe D LDW
3 John Doe L LDW
4 John Doe D LDW