Search code examples
pythonconcatenation

Creating a new column with concatenated values from another column


I am trying to create a new column in this data frame. The data set has multiple records for each PERSON because each record is a different account. The new column values should be a combination of the values for each PERSON in the TYPE column. For example, if John Doe has four accounts the value next to his nae in the new column should be a concatenation of the values in TYPE. An example of the final data frame is below. Thanks in advance.

enter image description here


Solution

  • You can do this in two lines (first code, then explanation):

    Code:

    in: name_types = df.pivot_table(index='Name', values='AccountType', aggfunc=set)
    out:
                    AccountType
    Name    
    Jane Doe        {D}
    John Doe        {L, W, D}
    Larry Wild      {L, D}
    Patti Shortcake {L, W}
    
    in: df['ClientType'] = df['Name'].apply(lambda x: name_types.loc[x]['AccountType'])
    

    Explanation:

    1. The pivot table gets all the AccountTypes for each individual name, and removes all duplicates using the 'set' aggregate function.

    2. The apply function then iterates through each 'Name' in the main data frame, looks up the AccountType associated with each in name_typed, and adds it to the new column ClientType in the main dataframe.

    And you're done!

    Addendum: If you need the column to be a string instead of a set, use:

    in: def to_string(the_set):
        string = ''
        for item in the_set:
            string += item
        return string
    
    in: df['ClientType'] = df['ClientType'].apply(to_string)
    
    in: df.head()
    out: 
    
    Name    AccountType ClientType
    0   Jane Doe    D   D
    1   John Doe    D   LDW
    2   John Doe    D   LDW
    3   John Doe    L   LDW
    4   John Doe    D   LDW