Search code examples
pythonpandasdatetimetruncatepandas-loc

Pandas Matching nearest Datetime values in 2 columns - type integer/long error


I have a DataFrame, D1:

Date	Symbol	ICO_to
    5/28/2017 18:00	MYST	5/30/2017
    5/29/2017 18:00	MYST	5/30/2017
    5/30/2017 18:00	MYST	5/30/2017
    6/1/2017 18:00	MYST	5/30/2017
    6/2/2017 18:00	MYST	5/30/2017
    6/3/2017 18:00	MYST	5/30/2017
    6/4/2017 18:00	MYST	5/30/2017
    6/5/2017 18:00	MYST	5/30/2017
    6/6/2017 18:00	MYST	5/30/2017

Per This link I'm trying two methods to identify the 'Date' value (closest match) that is closest to the 'ICO_to' date value (all rows have the same value). First I try to truncate, which should remove rows up to that Date value:

D1.Date = pd.to_datetime(D1.Date) 

D1.rename(columns={'ICO to': 'ICO_to'}, inplace=True)
D1.ICO_to = pd.to_datetime(D1.ICO_to)

ICO_to = D1['ICO_to'][0] #All values in this column are the same, I just want to reference that value
ICO_to = pd.to_datetime(ICO_to) # to make sure the value is a datetime

First_date_row = D1['Date'].truncate(before=ICO_to).iloc[-1] #Remove all rows not after/= to the ICO_to date value

However I get this error:

TypeError: Cannot compare type 'Timestamp' with type 'long'

Well, I know those are datetime values so not sure what the deal is. the ICO_to variable is a timestamp. I try this instead:

First_date_row = D1['Date'].loc[D1.index.get_loc(datetime.datetime(D1['ICO_to'][0]),method='nearest')] #Identify the row where 'Date' nearest matches 'ICO_to' value at row 0 

Using this instead of truncation, I get this error:

TypeError: an integer is required 

How can I either identify the Date value that most nearly matches the ICO_to value, or remove all rows before the closest match through truncation? Either method will work.


Solution

  • If you convert the rows to datetime objects, then you can just do simple math on the columns to find the absolute minimum distance.

    import pandas as pd
    
    D1.Date = pd.to_datetime(D1.Date)
    D1.ICO_to = pd.to_datetime(D1.ICO_to)
    D1[min(abs(D1.Date - D1.ICO_to)) == abs(D1.Date - D1.ICO_to)]
    
        Date    Symbol  ICO_to
    1   2017-05-29 18:00:00 MYST    2017-05-30 00:00:00
    

    As you can see, you'll need to be a bit careful with what you mean by close. Since you have hour information on the Date, but only a day on the ICO_to time, do you mean midnight or do you mean noon or any time at all during the day? The last option will complicate this method a bit.

    If you want to get all parts of the dataframe up to that value, then you can do this. First sort the DataFrame to ensure it's ordered, then slice it for all indices less than or equal to that where the min occurs.

    D1.sort_values(by='Date', inplace=True)
    D1.reset_index(drop=True)
    D1[D1.index <= D1[min(abs(D1.Date - D1.ICO_to)) == abs(D1.Date - D1.ICO_to)].index[0]]
    
        Date    Symbol  ICO_to
    0   2017-05-28 18:00:00 MYST    2017-05-30 00:00:00
    1   2017-05-29 18:00:00 MYST    2017-05-30 00:00:00