Search code examples
pythonpython-polars

Use str.extract to extract multiple matches in polars


df = pl.DataFrame(
    {
        "a": [
            "name=John, name=Billy",
            "name=Jeff",
            "name=Taylor",
        ]
    }
)

df.select(
    pl.col("a").str.extract(r"name=(\w+)", 1),
)

I'll get a Series with John, Jeff, Taylor. I'm wondering if there's a way to extract_all(?) and also get Billy? I realize this changes the dimensionality of the resulting series but was just wondering if this method is available somehow.


Solution

  • Actually, there is a str.extract_all method.

    df.select(
        pl.col("a")
        .str.extract_all(r"name=\w+")
        .explode()
        .str.extract(r"name=(\w+)")
        .alias("names")
    )
    
    shape: (4, 1)
    ┌────────┐
    │ names  │
    │ ---    │
    │ str    │
    ╞════════╡
    │ John   │
    │ Billy  │
    │ Jeff   │
    │ Taylor │
    └────────┘