How do I add a column with the first date
per id
as category?
something like:
df = df.broadcast_assign(level="id", colname="Init", data=date_of(df.groupby("id").first()))`
import numpy as np
import pandas as pd
def time(t):
return pd.Timestamp("2019-01-01T12") + pd.to_timedelta(t, "d")
arrays = [
np.array([1, 1, 2, 2]),
np.array([time(0), time(1), time(396), time(365)]),
]
df = pd.DataFrame(np.round(np.random.rand(1,4).T, decimals=1), index=arrays, columns=["x"])
df.index.names = ["id", "time"]
returns
x
id time
1 2019-01-01 12:00:00 0.4
2019-01-02 12:00:00 0.6
2 2020-02-01 12:00:00 0.3
2020-01-01 12:00:00 0.9
x Init
id time
1 2019-01-01 12:00:00 0.4 2019-01-01
2019-01-02 12:00:00 0.6 2019-01-01
2 2020-02-01 12:00:00 0.3 2020-01-01
2020-01-01 12:00:00 0.9 2020-01-01
You can reset the time
level in multiindex then group it on id
and transform
using min
:
df['Init'] = df.reset_index('time')['time'].dt.date\
.groupby('id').transform('min').tolist()
x Init
id time
1 2019-01-01 12:00:00 0.1 2019-01-01
2019-01-02 12:00:00 0.1 2019-01-01
2 2020-02-01 12:00:00 0.9 2020-01-01
2020-01-01 12:00:00 0.1 2020-01-01