Search code examples
pythonpython-polars

How to compare date values from rows in python polars?


I have a dataframe with date of births as

pl.DataFrame({'idx':[1,2,3,4,5,6],
              'date_of_birth':['03/06/1990','3/06/1990','11/12/2000','01/02/2021','1/02/2021','3/06/1990']})

enter image description here

Here I would like to compare date of birth(Format: Month/Day/Year) of each row and tag yes if the months are equal such as 03 - 3, 01 -1.

There are dates as 03/06/1900, 3/06/1990, they are generally same. but here they are treated as different. How to figure out these kind scenarios ?

The expected output as:

enter image description here


Solution

  • Unlike other answers that use pandas to compute the result, let me do your CPU a favor and give a polars only solution.

    df = pl.DataFrame({"idx":[1,2,3,4,5,6],
                  "date_of_birth":["03/06/1990","3/06/1990","11/12/2000","01/02/2021","1/02/2021","3/06/1990"]})
    
    
    df.with_columns(
       pl.when(
           pl.col("date_of_birth").str.to_date("%m/%d/%Y").dt.month().is_in([1, 3])
       )
       .then(pl.lit("Yes"))
       .otherwise(pl.lit("No"))
       .alias("match")
    )
    
    shape: (6, 3)
    ┌─────┬───────────────┬───────┐
    │ idx ┆ date_of_birth ┆ match │
    │ --- ┆ ---           ┆ ---   │
    │ i64 ┆ str           ┆ str   │
    ╞═════╪═══════════════╪═══════╡
    │ 1   ┆ 03/06/1990    ┆ Yes   │
    │ 2   ┆ 3/06/1990     ┆ Yes   │
    │ 3   ┆ 11/12/2000    ┆ No    │
    │ 4   ┆ 01/02/2021    ┆ Yes   │
    │ 5   ┆ 1/02/2021     ┆ Yes   │
    │ 6   ┆ 3/06/1990     ┆ Yes   │
    └─────┴───────────────┴───────┘