I am running many datetimeindexed dataframes through a function in which I slice the dataframe based on a start and end period. While, the same function runs fine for many dataframes it raises an unexplainable KeyError on some.
The data in the dataframes are identical in terms of datatypes, columns and formatting. Here is an example of a KeyError
:
Excerpt DataFrame that raises the error:
>> df_boiler_temp
>> @log_date @tariff_indicator #text
2017-04-23 00:12:48.802 none 65.0
2017-04-23 00:19:00.223 none 64.0
2017-04-23 00:24:02.544 none 63.0
2017-04-23 00:29:20.766 none 62.0
2017-04-23 00:35:00.088 none 61.0
2017-04-23 00:41:00.666 none 60.0
2017-04-23 00:46:00.632 none 59.0
2017-04-23 00:53:38.935 none 58.0
2017-04-23 00:59:21.152 none 57.0
2017-04-23 01:05:59.926 none 56.0
2017-04-23 01:09:58.652 none 65.0
2017-04-23 01:11:00.651 none 66.0
2017-04-23 01:14:10.577 none 67.0
2017-04-23 01:19:58.829 none 66.0
2017-04-23 01:28:00.635 none 65.0
Code that raises KeyError
df = df_boiler_temp.truncate(before=row['start'], after=row['end'])
(And so does this line which is essentially identical
df = df_boiler_temp[row['start']: row['end']]
)
The exception being raised is:
KeyError: 1492909671481000000L
Which translates to 2017-04-23 01:07:51.481000
in date time notation.
>> row['start']
>> 2017-04-23 01:07:51.481000
>> row['end']
>> 2017-04-23 02:24:07.953000
I don't understand why truncating with these values would raise a KeyError when the KeyError being raised falls perfectly between the datetimes that are present in the dataframe.Any idea in how I can solve this?
I don't understand why but I found this that pointed out that the dataset is not sorted. While as far as I have analysed the dataset it is already sorted the following line seemed to do the trick:
df = df_boiler_temp.sort_index().truncate(before=row['start'], after=row['end'])
EDIT: It seems that my dataset contained was copied and appended to the original dataset. Hence it was difficult to spot the duplicates.