Search code examples
pythonpandasvelocitytimestep

Timestep multiple items dataframe


I have such dataframe:

product1 product2 product3 product4 product5 time
straws orange melon chair bread 1
melon milk book coffee cake 2
bread bananas juice chair book 3
straws coffee cake milk orange 4

I need to have the time step per items.

Example :

TimesProduct1 TimesProduct2 TimesProduct3 Timesproduct4 Timesproduct5
0 0 0 0 0
1 0 0 0 0
2 0 0 2 1
3 2 2 2 3

df.diff() unfortunatly doesn't work in that way.

Thanks for your help.


Solution

  • setup

    import pandas as pd
    
    df = pd.DataFrame(
        {
            "product1":["straws", "melon", "bread", "straws"],
            "product2":["orange", "milk", "bananas", "coffee"],
            "product3":["melon", "book", "juice", "cake"],
            "product4":["chair", "coffee", "chair", "milk"],
            "product5":["bread", "cake", "book", "orange"],
            "time":[1,2,3,4],
        }
    )
    

    solution

    result = (
        df.melt(id_vars="time")
        .groupby("value")
        .apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))
        .pivot(index="time", columns="variable", values="diff")
        .fillna(0)
        .reset_index(drop=True)
    )
    

    solution 2 (for pandas bug workaround)

    def make_diff_column(df):
        df = df.sort_values("time")
        df["diff"] = df["time"].diff()
        return df
    
    
    result = (
        df.melt(id_vars="time")
        .groupby("value")
        .apply(make_diff_column)
        .pivot(index="time", columns="variable", values="diff")
        .fillna(0)
        .reset_index(drop=True)
    )