I want to get the value of the last row when grouping by Name. For instance, the last iteration of the name Walter in row 2, I want to get Dog + ", " + Cat for Col1 and Beer + ", " + Wine in Col3. There are a lot of columns, so I would like to make it based on indexing/column position instead of column names.
+------+---------+-------+
| Col1 | Name | Col3 |
+------+---------+-------+
| Dog | Walter | Beer |
| Cat | Walter | Wine |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Vodka |
+------+---------+-------+
This is the output I want:
+---------------+---------------------------+---------------------+
| Col1 | Name | Col3 |
+---------------+---------------------------+---------------------+
| Dog | Walter | Beer |
| Dog, Cat | Walter, Walter | Beer, Wine |
| Dog | Alfonso | Cider |
| Dog, Dog | Alfonso, Alfonso | Cider, Cider |
| Dog, Dog, Dog | Alfonso, Alfonso, Alfosno | Cider, Cider, Vodka |
+---------------+---------------------------+---------------------+
This is what I have tried (but does not work):
for i in df:
if df.loc[i,1] == df.loc[i+1,1]:
df.loc[i,0] + ", " + df.loc[i+1,0]
else:
df.loc[i+1,0]
I read that iterating over rows in pandas with a for-loop is frowned upon, so I would like to get the output by using vectorization or apply (or some other efficient way).
here is another way using accumulate
on the index and using df.agg
method:
from itertools import accumulate
import numpy as np
def fun(a):
l = [[i] for i in a.index]
acc = list(accumulate(l, lambda x, y: np.concatenate([x, y])))
return pd.concat([a.loc[idx].agg(','.join) for idx in acc],axis=1).T
out = pd.concat([fun(v) for k,v in df.groupby('Name',sort=False)])
print(out)
Col1 Name Col3
0 Dog Walter Beer
1 Dog,Cat Walter,Walter Beer,Wine
0 Dog Alfonso Cider
1 Dog,Dog Alfonso,Alfonso Cider,Cider
2 Dog,Dog,Dog Alfonso,Alfonso,Alfonso Cider,Cider,Vodka
You can add a reset index with drop=True
in the end to reset the indexes