Search code examples
pythonpython-polars

A scalable way of checking if a string column is contained within another string column in Polars


Is there a scalable way of creating the column B_in_A below that doesn't rely on map_elements?

import polars as pl

df = pl.DataFrame({"A":["foo","bar","foo"],"B":["f","b","s"]})

df = (
    df
    .with_columns(
        pl.struct(["A","B"])
        .map_elements(lambda row: (
            row["B"] in row["A"]
            ).alias("B_in_A"))
    )
)
print(df)

output is

shape: (3, 3)

┌─────┬─────┬────────┐
│ A   ┆ B   ┆ B_in_A │
│ --- ┆ --- ┆ ---    │
│ str ┆ str ┆ bool   │
╞═════╪═════╪════════╡
│ foo ┆ f   ┆ true   │
│ bar ┆ b   ┆ true   │
│ foo ┆ s   ┆ false  │
└─────┴─────┴────────┘

Solution

  • Use str.contains

    df.with_columns(B_in_A=pl.col('A').str.contains(pl.col('B')))
    shape: (3, 3)
    ┌─────┬─────┬────────┐
    │ A   ┆ B   ┆ B_in_A │
    │ --- ┆ --- ┆ ---    │
    │ str ┆ str ┆ bool   │
    ╞═════╪═════╪════════╡
    │ foo ┆ f   ┆ true   │
    │ bar ┆ b   ┆ true   │
    │ foo ┆ s   ┆ false  │
    └─────┴─────┴────────┘