Search code examples
pythondatetimepython-polars

Sort list of datetime in Python polars dataframe


I have problems working with datetime

my dataset is like this:

import datetime
import polars as pl

df = pl.DataFrame({
    'customer_unique_id': ['a', 'b', 'c', 'd', 'e'],
    'order_purchase_timestamp': [[datetime.datetime(2018, 8, 13, 9, 14, 7),
      datetime.datetime(2018, 7, 28, 0, 23, 49)],
     [datetime.datetime(2017, 7, 27, 14, 13, 3),
      datetime.datetime(2018, 1, 14, 7, 36, 54)],
     [datetime.datetime(2017, 9, 1, 12, 11, 23),
      datetime.datetime(2018, 5, 26, 19, 42, 48)],
     [datetime.datetime(2018, 2, 19, 17, 11, 34),
      datetime.datetime(2018, 3, 22, 18, 9, 41)],
     [datetime.datetime(2018, 5, 23, 20, 14, 21),
      datetime.datetime(2018, 6, 3, 10, 12, 57)]]
})

I wish to sort the dates inside the list on every row, and get the first and second datetime, because some rows has more than 2 values.

I'm trying with:

df.filter(
    pl.col("order_purchase_timestamp").list.sort()
)

but I got this error:

ComputeError: filter predicate must be of type Boolean, got list[datetime[μs]]


Solution

  • You can use .with_columns() context to select & modify columns.

    col = pl.col("order_purchase_timestamp")
    
    df.with_columns(
        col.list.sort(descending=True).list.slice(0, 2)
    )   #        /                      /
        #      sort by date            take slice [0:2]
    

    Solution that follows from the discussion in the comments

    To perform some action under column based on condition, you can use pl.when -> then -> otherwise construct. pl.when() takes some boolean Series (condition). Read more here.

    col = pl.col("order_purchase_timestamp")
    
    # duration (difference) between 2 dates (in microseconds)
    diff = pl.duration(microseconds=(col.list.get(0) - col.list.get(1)))
    
    df.filter(col.list.len() >= 2).with_columns(
        col.list.sort(descending=True).list.slice(0, 2)
    ).with_columns(
        less_30 = pl.when(diff <= pl.duration(days=30))\
                    .then(True).otherwise(False)
    )