I have a dataframe where i wish to compute the number of days (timedelta) that a unique asset remained installed. The sample input dataframe is as follows
df =pd.DataFrame({'Date': ['2007-11-01','2012-03-02','2012-03-02','2013-04-01','2013-04-01','2017-11-15','2017-11-15'], 'action':['installed','installed','removed','installed','removed','installed','removed'], 'asset_alphnum':['A-3724','A-3534','A-3724','A2732','A-3534','A-2007','A2732']})
Outputs:
I have tried pd.crosstab
pd.crosstab(df.asset_alphnum, [df.Date, df.action])
However I cannot work out how to take it from here. Somehow need to collapse the hierarchical index and subract one date from the next.
Any guidance much appreciated.
Assuming you only ever have one install date and one removed date for each ID, you can do it with a pivot table, by using max (or min or any other aggregating function) on the date values.
df["Date"] = pd.to_datetime(df["Date"]) # Converting strings to datetimes
df = df.pivot_table(index="asset_alphnum", columns="action", values="Date", aggfunc=max) # max is just an example, most aggfuncs will work if there's only ever one date
df["time_installed"] = df["removed"] - df["installed"] # timedelta column
Which outputs this:
asset_alphnum | installed | removed | time_installed |
---|---|---|---|
A-2007 | 2017-11-15 00:00:00 | NaT | NaT |
A-3534 | 2012-03-02 00:00:00 | 2013-04-01 00:00:00 | 395 days 00:00:00 |
A-3724 | 2007-11-01 00:00:00 | 2012-03-02 00:00:00 | 1583 days 00:00:00 |
A2732 | 2013-04-01 00:00:00 | 2017-11-15 00:00:00 | 1689 days 00:00:00 |