From a two string columns pandas data frame looking like:
d = {'SCHOOL' : ['Yale', 'Yale', 'LBS', 'Harvard','UCLA', 'Harvard', 'HEC'],
'NAME' : ['John', 'Marc', 'Alex', 'Will', 'Will','Miller', 'Tom']}
df = pd.DataFrame(d)
Notice the relationship between NAME to SCHOOL is n to 1. I want to get the last school in case one person has gone to two different schools (see "Will" case).
So far I got:
df = df.groupby('NAME')['SCHOOL'].unique().reset_index()
Return:
NAME SCHOOL
0 Alex [LBS]
1 John [Yale]
2 Marc [Yale]
3 Miller [Harvard]
4 Tom [HEC]
5 Will [Harvard, UCLA]
PROBLEMS:
Both problems where solved based on @IanS comments.
Using last()
instead of unique()
:
df = df.groupby('NAME')['SCHOOL'].last().reset_index()
Return:
NAME SCHOOL
0 Alex LBS
1 John Yale
2 Marc Yale
3 Miller Harvard
4 Tom HEC
5 Will UCLA