Search code examples
python-polars

polars DataFrame - search strings from list


I need to search in string that contains a substring. I am looking for the efficient way to do so.

Slow version:

import polars as pl

def search_text(queries, text):
    return [query for query in queries if query in text]


pl_df = pl.DataFrame( {
        "Title": ["I am aa", "I am bbob"]
    })

queries = ['aa', 'bb']

pl_df = pl_df.with_columns(pl.col('Title').map_elements(lambda text: search_text(queries, text)).alias('Title_match'))

print(pl_df)
shape: (2, 2)
┌───────────┬─────────────┐
│ Title     ┆ Title_match │
│ ---       ┆ ---         │
│ str       ┆ list[str]   │
╞═══════════╪═════════════╡
│ I am aa   ┆ ["aa"]      │
│ I am bbob ┆ ["bb"]      │
└───────────┴─────────────┘

Solution

  • You can use .str.extract_many()

    df.with_columns(Title_match = pl.col.Title.str.extract_many(queries))
    
    shape: (2, 2)
    ┌───────────┬─────────────┐
    │ Title     ┆ Title_match │
    │ ---       ┆ ---         │
    │ str       ┆ list[str]   │
    ╞═══════════╪═════════════╡
    │ I am aa   ┆ ["aa"]      │
    │ I am bbob ┆ ["bb"]      │
    └───────────┴─────────────┘