I have a pd.DataFrame
which has duplicate time in its index.
Example:
from datetime import datetime
import pandas as pd
time_index = [
datetime(2017, 1, 1, 0, 4, 1, 80000), datetime(2017, 1, 1, 0, 4, 1, 80000), datetime(2017, 1, 1, 0, 4, 1, 80000),
datetime(2017, 1, 1, 0, 4, 2), datetime(2017, 1, 1, 0, 4, 3)
]
df_i_have = pd.DataFrame([1, 2, 3, 4, 5], columns=['A'], index=time_index)
Notice that the first three rows have the exact same time.
My goal is to convert this data frame to one with unique time index. More specifically, I would like to add to each duplicated time on the dataframe index, one some small timedelta
such that I end up with a dataframe like this one:
new_time_index = [
datetime(2017, 1, 1, 0, 4, 1, 80000), datetime(2017, 1, 1, 0, 4, 1, 81000), datetime(2017, 1, 1, 0, 4, 1, 82000),
datetime(2017, 1, 1, 0, 4, 2), datetime(2017, 1, 1, 0, 4, 3)
]
df_i_want = pd.DataFrame([1, 2, 3, 4, 5], columns=['A'], index=new_time_index)
Can anyone please help with an efficient way to do this?
You can groupby and cumcount and then add milliseconds:
t = df_i_have.groupby(level=0).cumcount()
df_i_have.index += pd.to_timedelta(t,unit='ms')
print(df_i_have)
A
2017-01-01 00:04:01.080 1
2017-01-01 00:04:01.081 2
2017-01-01 00:04:01.082 3
2017-01-01 00:04:02.000 4
2017-01-01 00:04:03.000 5