I'm trying to check if string_1 = "this example string"
contains a column value as a substring.
For example the first value in Col B
should be True
since "example"
is a substring of string_1
string_1 = "this example string"
df = pl.from_repr("""
┌────────┬─────────┬────────────┐
│ Col A ┆ Col B ┆ Col C │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════╪═════════╪════════════╡
│ 448220 ┆ example ┆ 7101936801 │
│ 518398 ┆ 99999 ┆ 9999900091 │
│ 557232 ┆ 424570 ┆ 4245742060 │
└────────┴─────────┴────────────┘
""")
This is what I have tried so far, but it's returning the following error:
df=df.with_columns(pl.col("Col B").apply(lambda x: x in string_1).alias("new_col"))
AttributeError: 'Expr' object has no attribute 'apply'
It's always better to avoid using python functions and use native polars expressions.
pl.lit()
to create a dummy column from string_1
str.contains()
to check if string contains a column value.(
df
.with_columns(pl.lit(string_1).str.contains(pl.col('Col B')).alias('new_col')
)
Or you can use name.keep()
if you want to check all columns.
(
df
.with_columns(pl.lit(string_1).str.contains(pl.all()).name.keep())
)
┌───────┬───────┬───────┐
│ Col A ┆ Col B ┆ Col C │
│ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool │
╞═══════╪═══════╪═══════╡
│ false ┆ true ┆ false │
│ false ┆ false ┆ false │
│ false ┆ false ┆ false │
└───────┴───────┴───────┘
or something like this if you need all new columns:
(
df
.with_columns(pl.lit(string_1).str.contains(pl.all()).name.suffix('_match'))
)
┌────────┬─────────┬────────────┬─────────────┬─────────────┬─────────────┐
│ Col A ┆ Col B ┆ Col C ┆ Col A_match ┆ Col B_match ┆ Col C_match │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ bool ┆ bool ┆ bool │
╞════════╪═════════╪════════════╪═════════════╪═════════════╪═════════════╡
│ 448220 ┆ example ┆ 7101936801 ┆ false ┆ true ┆ false │
│ 518398 ┆ 99999 ┆ 9999900091 ┆ false ┆ false ┆ false │
│ 557232 ┆ 424570 ┆ 4245742060 ┆ false ┆ false ┆ false │
└────────┴─────────┴────────────┴─────────────┴─────────────┴─────────────┘