I have a simple dataframe like this:
df = pd.DataFrame({'class':['a','b','c','d','e'],
'name':['Adi','leon','adi','leo','andy'],
'age':['9','8','9','9','8'],
'score':['40','90','35','95','85']})
then the result is like this
class name age score
a Adi 9 40
b leon 8 90
a adi 9 35
d leo 9 95
e andy 8 85
how can I combine the row named 'Adi' with 'adi' in the same column while he is only one person and the score 'Adi' is 75, not 40 and 35
You could use pandas.DataFrame.groupby
and pandas.DataFrame.aggregate
after first making the name
column lowercase:
import pandas as pd
df = pd.DataFrame({
'class': ['a', 'b', 'c', 'd', 'e'],
'name': ['Adi', 'leon', 'adi', 'leo', 'andy'],
'age': ['9', '8', '9', '9', '8'],
'score': ['40', '90', '35', '95', '85']
})
df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
aggregate_funcs = {
'class': lambda s: ', '.join(set(s)),
'age': lambda s: ', '.join(set(s)),
'score': sum
}
df = df.groupby(df['name']).aggregate(aggregate_funcs)
print(df)
Output:
class age score
name
adi c, a 9 75
andy e 8 85
leo d 9 95
leon b 8 90