Search code examples
pythonpandastransformationdata-wrangling

Add the value of the last row to to this row


I want to get the value of the last row when grouping by Name. For instance, the last iteration of the name Walter in row 2, I want to get Dog + ", " + Cat for Col1 and Beer + ", " + Wine in Col3. There are a lot of columns, so I would like to make it based on indexing/column position instead of column names.

+------+---------+-------+
| Col1 |  Name   | Col3  |
+------+---------+-------+
| Dog  | Walter  | Beer  |
| Cat  | Walter  | Wine  |
| Dog  | Alfonso | Cider |
| Dog  | Alfonso | Cider |
| Dog  | Alfonso | Vodka |
+------+---------+-------+

This is the output I want:

+---------------+---------------------------+---------------------+
|     Col1      |           Name            |        Col3         |
+---------------+---------------------------+---------------------+
| Dog           | Walter                    | Beer                |
| Dog, Cat      | Walter, Walter            | Beer, Wine          |
| Dog           | Alfonso                   | Cider               |
| Dog, Dog      | Alfonso, Alfonso          | Cider, Cider        |
| Dog, Dog, Dog | Alfonso, Alfonso, Alfosno | Cider, Cider, Vodka |
+---------------+---------------------------+---------------------+

This is what I have tried (but does not work):

for i in df:
    if df.loc[i,1] == df.loc[i+1,1]:
        df.loc[i,0] + ", " + df.loc[i+1,0]
    else:
        df.loc[i+1,0]

I read that iterating over rows in pandas with a for-loop is frowned upon, so I would like to get the output by using vectorization or apply (or some other efficient way).


Solution

  • here is another way using accumulate on the index and using df.agg method:

    from itertools import accumulate
    import numpy as np
    
    def fun(a):
        l = [[i] for i in a.index]
        acc = list(accumulate(l, lambda x, y: np.concatenate([x, y])))
        return pd.concat([a.loc[idx].agg(','.join) for idx in acc],axis=1).T
    out = pd.concat([fun(v) for k,v in df.groupby('Name',sort=False)])
    

    print(out)
              Col1                     Name               Col3
    0          Dog                   Walter               Beer
    1      Dog,Cat            Walter,Walter          Beer,Wine
    0          Dog                  Alfonso              Cider
    1      Dog,Dog          Alfonso,Alfonso        Cider,Cider
    2  Dog,Dog,Dog  Alfonso,Alfonso,Alfonso  Cider,Cider,Vodka
    

    You can add a reset index with drop=True in the end to reset the indexes