Search code examples
pythonregexpython-re

How to get list of possible replacements in string using regex in python?


I have the following strings:

4563_1_some_data

The general pattern is

r'\d{1,5}_[1-4]_some_data

Note, that numbers before first underscore may be the same for different some_data
So the question is: how to get all possible variants of replacement using regex?
Desired output:

[4563_1_some_data, 4563_2_some_data, 4563_3_some_data, 4563_4_some_data]

My attempt:

[re.sub(r'_\d_', r'_[1-4]_', row).group() for row in data]

But it has result:

4563_[1-4]_some_data

Seems like ordinary replacement. How should I activate pattern replacement and get the list?


Solution

  • You need to iterate over a range object to create your list.

    1. Create that range object: To do that you need a pattern to get the [1-4] part from your pattern.
    2. You'll need another pattern to replace the number in the actual data with the variable from range object.
    import re
    
    text = "4563_1_some_data"
    original_pattern = r"\d{1,5}_[1-4]_some_data"
    
    # A regex to get the `[1-4]` part.
    find_numbers_pattern = r"(?<=_)\[(\d)\-(\d)\](?=_)"
    
    # Get `1` and `4` to create a range object
    a, b = map(int, re.search(find_numbers_pattern, original_pattern).groups())
    
    # Use `count=` to only change the first match.
    lst = [re.sub(r"(?<=_)\d(?=_)", str(i), text, count=1) for i in range(a, b + 1)]
    
    print(lst)
    

    output:

    ['4563_1_some_data', '4563_2_some_data', '4563_3_some_data', '4563_4_some_data']
    
    (?<=_)\[(\d)\-(\d)\](?=_) explanation:

    (?<=_): A positive lookbehind assertion to match _.

    \[(\d)\-(\d)\]: to get the [1-4] for example.

    (?=_): A positive lookahead assertion to find the _.