Search code examples
pythonpython-polars

Polars str.starts_with() with values from another column


I have a polars DataFrame for example:

>>> df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})
>>> df
shape: (4, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ app │
│ b   ┆ nop │
│ c   ┆ cap │
│ d   ┆ tab │
└─────┴─────┘

I'm trying to get a third column C which is True if strings in column B starts with the strings in column A of the same row, otherwise, False. So in the case above, I'd expect:

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

I'm aware of the df['B'].str.starts_with() function but passing in a column yielded:

>>> df['B'].str.starts_with(pl.col('A'))
...  # Some stuff here.
TypeError: argument 'sub': 'Expr' object cannot be converted to 'PyString'

What's the way to do this? In pandas, you would do:

df.apply(lambda d: d['B'].startswith(d['A']), axis=1)

Solution

  • Expression support was added for .str.starts_with() in pull/6355 as part of the Polars 0.15.17 release.

    df.with_columns(pl.col("B").str.starts_with(pl.col("A")).alias("C"))
    
    shape: (4, 3)
    ┌─────┬─────┬───────┐
    │ A   | B   | C     │
    │ --- | --- | ---   │
    │ str | str | bool  │
    ╞═════╪═════╪═══════╡
    │ a   | app | true  │
    │ b   | nop | false │
    │ c   | cap | true  │
    │ d   | tab | false │
    └─────┴─────┴───────┘