python pandas group-by pandas-groupby crosstab

Finding a timedelta in pandas dataframe based upon specific values in one column

I have a dataframe where i wish to compute the number of days (timedelta) that a unique asset remained installed. The sample input dataframe is as follows

df =pd.DataFrame({'Date': ['2007-11-01','2012-03-02','2012-03-02','2013-04-01','2013-04-01','2017-11-15','2017-11-15'], 'action':['installed','installed','removed','installed','removed','installed','removed'], 'asset_alphnum':['A-3724','A-3534','A-3724','A2732','A-3534','A-2007','A2732']})

Outputs:

I have tried pd.crosstab

pd.crosstab(df.asset_alphnum, [df.Date, df.action])

outputs

However I cannot work out how to take it from here. Somehow need to collapse the hierarchical index and subract one date from the next.

Any guidance much appreciated.

Solution

Assuming you only ever have one install date and one removed date for each ID, you can do it with a pivot table, by using max (or min or any other aggregating function) on the date values.

df["Date"] = pd.to_datetime(df["Date"])  # Converting strings to datetimes
df = df.pivot_table(index="asset_alphnum", columns="action", values="Date", aggfunc=max)  # max is just an example, most aggfuncs will work if there's only ever one date
df["time_installed"] = df["removed"] - df["installed"]  # timedelta column

Which outputs this:

asset_alphnum	installed	removed	time_installed
A-2007	2017-11-15 00:00:00	NaT	NaT
A-3534	2012-03-02 00:00:00	2013-04-01 00:00:00	395 days 00:00:00
A-3724	2007-11-01 00:00:00	2012-03-02 00:00:00	1583 days 00:00:00
A2732	2013-04-01 00:00:00	2017-11-15 00:00:00	1689 days 00:00:00