Search code examples
pythondataframedata-science

How to concatenate same row names in same column on Python DataFrame


I have a simple dataframe like this:

df = pd.DataFrame({'class':['a','b','c','d','e'],
                  'name':['Adi','leon','adi','leo','andy'],
                  'age':['9','8','9','9','8'],
                   'score':['40','90','35','95','85']})

then the result is like this

 class  name   age  score
    a   Adi     9   40
    b   leon    8   90
    a   adi     9   35
    d   leo     9   95
    e   andy    8   85

how can I combine the row named 'Adi' with 'adi' in the same column while he is only one person and the score 'Adi' is 75, not 40 and 35


Solution

  • You could use pandas.DataFrame.groupby and pandas.DataFrame.aggregate after first making the name column lowercase:

    import pandas as pd
    
    df = pd.DataFrame({
        'class': ['a', 'b', 'c', 'd', 'e'],
        'name': ['Adi', 'leon', 'adi', 'leo', 'andy'],
        'age': ['9', '8', '9', '9', '8'],
        'score': ['40', '90', '35', '95', '85']
    })
    df['name'] = df['name'].str.lower()
    df['score'] = df['score'].astype(int)
    aggregate_funcs = {
        'class': lambda s: ', '.join(set(s)),
        'age': lambda s: ', '.join(set(s)),
        'score': sum
    }
    df = df.groupby(df['name']).aggregate(aggregate_funcs)
    print(df)
    

    Output:

         class age  score
    name                 
    adi   c, a   9     75
    andy     e   8     85
    leo      d   9     95
    leon     b   8     90