Search code examples
pythonpython-polars

str.replace_all() using regex


I have a rather simple task.

say i have a column called path, it looks something like this:

pl.Series('path', ['[1, Phone], [2, Tablet], [3, Tablet], [4, Phone], [5, Phone], [6, Phone]'])

i'd like to replace the commas between blocks with hyphen (-).

[1, Phone]-[2, Tablet]-[3, Tablet]-[4, Phone]-[5, Phone]-[6, Phone]

I tried the following using this pattern [a-z]\](,)\s so (,) is a capturing group. however, this doesn't work, since it replaces the entire pattern.

-- consider the column as part of a df.

with_columns(pl.col('path').str.replace_all(r'[a-z]\](,)\s', '-'))

Am I missing something? i'd appreciate any input or idea!


Solution

  • It seems like perhaps you are under the impression that only the capture group is replaced?

    The main use of the capture group is to use in the replacement side, so you capture the parts you want to keep.

    In this case, whatever comes before the comma:

    s = pl.Series('path', ['[1, Phone], [2, Tablet], [3, Tablet], [4, Phone], [5, Phone], [6, Phone]'])
    
    s.str.replace_all(r'([a-z]\]),\s', '$1-')
    
    shape: (1,)
    Series: 'path' [str]
    [
        "[1, Phone]-[2, Tablet]-[3, Tablet]-[4, Phone]-[5, Phone]-[6, Phone]"
    ]
    

    $1 is replaced with the contents of capture group 1.

    Polars syntax for this follows the underlying rust library:

    https://docs.rs/regex/latest/regex/struct.Regex.html#replacement-string-syntax