Search code examples
pythondataframekeyerrortruncation

Slicing/Truncating TimeIndexed DataFrame raises KeyError


I am running many datetimeindexed dataframes through a function in which I slice the dataframe based on a start and end period. While, the same function runs fine for many dataframes it raises an unexplainable KeyError on some. The data in the dataframes are identical in terms of datatypes, columns and formatting. Here is an example of a KeyError:

Excerpt DataFrame that raises the error:

>> df_boiler_temp
>> @log_date @tariff_indicator  #text                                   
2017-04-23 00:12:48.802              none   65.0
2017-04-23 00:19:00.223              none   64.0
2017-04-23 00:24:02.544              none   63.0
2017-04-23 00:29:20.766              none   62.0
2017-04-23 00:35:00.088              none   61.0
2017-04-23 00:41:00.666              none   60.0
2017-04-23 00:46:00.632              none   59.0
2017-04-23 00:53:38.935              none   58.0
2017-04-23 00:59:21.152              none   57.0
2017-04-23 01:05:59.926              none   56.0
2017-04-23 01:09:58.652              none   65.0
2017-04-23 01:11:00.651              none   66.0
2017-04-23 01:14:10.577              none   67.0
2017-04-23 01:19:58.829              none   66.0
2017-04-23 01:28:00.635              none   65.0

Code that raises KeyError

df = df_boiler_temp.truncate(before=row['start'], after=row['end'])

(And so does this line which is essentially identical

df = df_boiler_temp[row['start']: row['end']]

)

The exception being raised is:

KeyError: 1492909671481000000L

Which translates to 2017-04-23 01:07:51.481000 in date time notation.

>> row['start']
>> 2017-04-23 01:07:51.481000

>> row['end']
>> 2017-04-23 02:24:07.953000

I don't understand why truncating with these values would raise a KeyError when the KeyError being raised falls perfectly between the datetimes that are present in the dataframe.Any idea in how I can solve this?


Solution

  • I don't understand why but I found this that pointed out that the dataset is not sorted. While as far as I have analysed the dataset it is already sorted the following line seemed to do the trick:

    df = df_boiler_temp.sort_index().truncate(before=row['start'], after=row['end'])
    

    EDIT: It seems that my dataset contained was copied and appended to the original dataset. Hence it was difficult to spot the duplicates.