Search code examples
pythonpandasdataframepandas-groupbymulti-index

how to broadcast-add the first date per ID as column to multiindex dataframe?


How do I add a column with the first date per id as category?

something like:

df = df.broadcast_assign(level="id", colname="Init", data=date_of(df.groupby("id").first()))`

Data

import numpy as np
import pandas as pd

def time(t):
    return pd.Timestamp("2019-01-01T12") + pd.to_timedelta(t, "d")


arrays = [
    np.array([1, 1, 2, 2]),
    np.array([time(0), time(1), time(396), time(365)]),
]
df = pd.DataFrame(np.round(np.random.rand(1,4).T, decimals=1), index=arrays, columns=["x"])
df.index.names = ["id", "time"]

returns

                            x
id  time    
1   2019-01-01 12:00:00     0.4
    2019-01-02 12:00:00     0.6
2   2020-02-01 12:00:00     0.3
    2020-01-01 12:00:00     0.9

Target

                            x      Init
id  time    
1   2019-01-01 12:00:00     0.4    2019-01-01
    2019-01-02 12:00:00     0.6    2019-01-01
2   2020-02-01 12:00:00     0.3    2020-01-01
    2020-01-01 12:00:00     0.9    2020-01-01

Solution

  • You can reset the time level in multiindex then group it on id and transform using min:

    df['Init'] = df.reset_index('time')['time'].dt.date\
                   .groupby('id').transform('min').tolist()
    

                              x        Init
    id time                                
    1  2019-01-01 12:00:00  0.1  2019-01-01
       2019-01-02 12:00:00  0.1  2019-01-01
    2  2020-02-01 12:00:00  0.9  2020-01-01
       2020-01-01 12:00:00  0.1  2020-01-01