Search code examples
pythonpython-polars

How to select rows that have more than one value in a list[datetime[µs]] column?


I trying to learn the package Polars. I have a dataframe with datetime column. So I imported the csv file df_order_dataset = pl.read_csv(file = '/content/drive/MyDrive/Colab_Notebooks/Olist/Datasets/raw_datasets/olist_orders_dataset.csv', parse_dates=True)

The column type (second column) was as shown in the image below.

enter image description here

Some rows have more than one value, like the last row in the image above. I'm interested in the rows where the second column has more than one value, I thought of filtering by the list size but I can't get the list size of each row.

How can I do a filter based on the lines that have two or more values?


Solution

  • Should be something like this

    df.filter(
        pl.col("order_purchase_timestamp").list.len() >= 2
    )