I have such dataframe:
product1 | product2 | product3 | product4 | product5 | time |
---|---|---|---|---|---|
straws | orange | melon | chair | bread | 1 |
melon | milk | book | coffee | cake | 2 |
bread | bananas | juice | chair | book | 3 |
straws | coffee | cake | milk | orange | 4 |
I need to have the time step per items.
Example :
TimesProduct1 | TimesProduct2 | TimesProduct3 | Timesproduct4 | Timesproduct5 |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 2 | 1 |
3 | 2 | 2 | 2 | 3 |
df.diff() unfortunatly doesn't work in that way.
Thanks for your help.
setup
import pandas as pd
df = pd.DataFrame(
{
"product1":["straws", "melon", "bread", "straws"],
"product2":["orange", "milk", "bananas", "coffee"],
"product3":["melon", "book", "juice", "cake"],
"product4":["chair", "coffee", "chair", "milk"],
"product5":["bread", "cake", "book", "orange"],
"time":[1,2,3,4],
}
)
solution
result = (
df.melt(id_vars="time")
.groupby("value")
.apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))
.pivot(index="time", columns="variable", values="diff")
.fillna(0)
.reset_index(drop=True)
)
solution 2 (for pandas bug workaround)
def make_diff_column(df):
df = df.sort_values("time")
df["diff"] = df["time"].diff()
return df
result = (
df.melt(id_vars="time")
.groupby("value")
.apply(make_diff_column)
.pivot(index="time", columns="variable", values="diff")
.fillna(0)
.reset_index(drop=True)
)