I have a dataframe with date of births as
pl.DataFrame({'idx':[1,2,3,4,5,6],
'date_of_birth':['03/06/1990','3/06/1990','11/12/2000','01/02/2021','1/02/2021','3/06/1990']})
Here I would like to compare date of birth(Format: Month/Day/Year) of each row and tag yes if the months are equal such as 03 - 3, 01 -1.
There are dates as 03/06/1900, 3/06/1990, they are generally same. but here they are treated as different. How to figure out these kind scenarios ?
The expected output as:
Unlike other answers that use pandas
to compute the result, let me do your CPU a favor and give a polars
only solution.
df = pl.DataFrame({"idx":[1,2,3,4,5,6],
"date_of_birth":["03/06/1990","3/06/1990","11/12/2000","01/02/2021","1/02/2021","3/06/1990"]})
df.with_columns(
pl.when(
pl.col("date_of_birth").str.to_date("%m/%d/%Y").dt.month().is_in([1, 3])
)
.then(pl.lit("Yes"))
.otherwise(pl.lit("No"))
.alias("match")
)
shape: (6, 3)
┌─────┬───────────────┬───────┐
│ idx ┆ date_of_birth ┆ match │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═══════════════╪═══════╡
│ 1 ┆ 03/06/1990 ┆ Yes │
│ 2 ┆ 3/06/1990 ┆ Yes │
│ 3 ┆ 11/12/2000 ┆ No │
│ 4 ┆ 01/02/2021 ┆ Yes │
│ 5 ┆ 1/02/2021 ┆ Yes │
│ 6 ┆ 3/06/1990 ┆ Yes │
└─────┴───────────────┴───────┘