python pandas dataframe data-manipulation

Pandas: Fixing end dates in a changelog

I have a dataframe representing all changes that have been made to a record over time. Among other things, this dataframe contains a record id (in this case not unique and not meant to be as it tracks multiple changes to the same record on a different table), startdate and enddate. Enddate is only included if it is know/preset, often it is not. I would like to map the enddate of each change record to the startdate of the next record in the dataframe with the same id.

>>> thing = pd.DataFrame([                               
...     {'id':1,'startdate':date(2021,1,1),'enddate':date(2022,1,1)},
...     {'id':1,'startdate':date(2021,3,24),'enddate':None},
...     {'id':1,'startdate':date(2021,5,26),'enddate':None},
...     {'id':2,'startdate':date(2021,2,2),'enddate':None},
...     {'id':2,'startdate':date(2021,11,26),'enddate':None}
... ])
>>> thing
   id   startdate     enddate
0   1  2021-01-01  2022-01-01
1   1  2021-03-24        None
2   1  2021-05-26        None
3   2  2021-02-02        None
4   2  2021-11-26        None

The dataframe is already sorted by the creation timestamp of the record and the id. I tried this:

thing['enddate'] = thing.groupby('id')['startdate'].apply(lambda x: x.shift())

However the above code only maps this to around 10,000 of my 120,000 rows, the majority of which would have an enddate if I were to do this comparison by hand. Can anyone think of a better way to perform this kind of manipulation? For reference, give the dataframe above I'd like to create this one:

    >>> thing
   id   startdate     enddate
0   1  2021-01-01  2021-03-24
1   1  2021-03-24  2021-05-26
2   1  2021-05-26        None
3   2  2021-02-02  2021-11-26
4   2  2021-11-26        None

The idea is that once this transformation is done, I'll have a timeframe between which the configurations stored in the other columns (not impportant for this) were in place

Solution

here is one way to do it

use transform with the groupby to assign back the values to the rows comprising the group

df['enddate']=df.groupby(['id'])['startdate'].transform(lambda x: x.shift(-1))
df

    id  startdate   enddate
0   1   2021-01-01  2021-03-24
1   1   2021-03-24  2021-05-26
2   1   2021-05-26  NaT
3   2   2021-02-02  2021-11-26
4   2   2021-11-26  NaT