Search code examples
pythonpy-datatable

How to filter by date with python's datatable


I have the following datatable, which I would like to filter by dates greater than "2019-01-01". The problem is that the dates are strings.


dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})

This is my best attempt.

dt_dates[f.days_date > datetime.strptime(f.days_date, "2019-01-01")]

this returns the error

TypeError: strptime() argument 1 must be str, not Expr

what is the best way to filter dates in python's datatable?

Reference

python datatable

f-expressions


Solution

  • Your datetime syntax is incorrect, for converting a string to a datetime.

    What you're looking for is:

    dt_dates[f.days_date > datetime.strptime(f.days_date, "%Y-%m-%d")]
    

    Where the 2nd arguement for strptime is the date format.

    However, lets take a step back, because this isn't the right way to do it.

    First, we should convert all your dates in your Frame to a datetime. I'll be honest, I've never used a datatable, but the syntax looks extremely similar to panda's Dataframe.

    In a dataframe, we can do the following:

    df_date = df_date['days_date'].apply(lambda x: datetime.strptime(x, '%Y-%m'%d))
    

    This goes through each row where the column is 'dates_date" and converts each string into a datetime.

    From there, we can use a filter to get the relevant rows:

    df_date = df_date[df_date['days_date'] > datetime.strptime("2019-01-01", "%Y-%m-%d")]