Search code examples
pythonpandastimestampdrop

drop rows based on condition


I want to keep only the rows in which the time is between the July 4 and May 24 of the same year, so I'm using this code :

def fix_time(data):
     12     data['timestamp'] = pd.to_datetime(data['timestamp'], format="%d-%m-%Y %H:%M:%S")
---> 13     indexNames = data[ (data['timestamp'] < '24-05-2021 00:00:00') & (data['timestamp'] > '05-07-2021 00:00:00') ].index
     14     data.drop(indexNames , inplace=True)
     15     return data

But it doesn't work as I wanted: when I use data['timestamp'].max() I get 2021-09-30 and that's not be correct.


Solution

  • between works better for this:

    def fix_time(data):
        data['timestamp'] = pd.to_datetime(data['timestamp'], format="%d-%m-%Y %H:%M:%S")
        return data[data['timestamp'].between('2021-05-07', '2021-05-24')]
    

    Also, note that you must use the ISO format of dates when comparing dates in pandas, i.e., you have to write 2021-05-24 00:00:00 (yyyy-mm-dd) instead of 24-05-2021 00:00:00 (dd-mm-yyyy).