Search code examples
pythonregexpython-2.7

Replace named captured groups with arbitrary values in Python


I need to replace the value inside a capture group of a regular expression with some arbitrary value; I've had a look at the re.sub, but it seems to be working in a different way.

I have a string like this one :

s = 'monthday=1, month=5, year=2018'

and I have a regex matching it with captured groups like the following :

regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')

now I want to replace the group named d with aaa, the group named m with bbb and group named Y with ccc, like in the following example :

'monthday=aaa, month=bbb, year=ccc'

basically I want to keep all the non matching string and substitute the matching group with some arbitrary value.

Is there a way to achieve the desired result ?

Note

This is just an example, I could have other input regexs with different structure, but same name capturing groups ...

Update

Since it seems like most of the people are focusing on the sample data, I add another sample, let's say that I have this other input data and regex :

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'

as you can see I still have the same number of capturing groups(3) and they are named the same way, but the structure is totally different... What I need though is as before replacing the capturing group with some arbitrary text :

'ccc-bbb-aaa'

replace capture group named Y with ccc, the capture group named m with bbb and the capture group named d with aaa.

In the case, regexes are not the best tool for the job, I'm open to some other proposal that achieve my goal.


Solution

  • This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace.

    Since you've written your regex the wrong way, you have to do most of the substitution operation manually:

    """
    Replaces the text captured by named groups.
    """
    def replace_groups(pattern, string, replacements):
        pattern = re.compile(pattern)
        # create a dict of {group_index: group_name} for use later
        groupnames = {index: name for name, index in pattern.groupindex.items()}
    
        def repl(match):
            # we have to split the matched text into chunks we want to keep and
            # chunks we want to replace
            # captured text will be replaced. uncaptured text will be kept.
            text = match.group()
            chunks = []
            lastindex = 0
            for i in range(1, pattern.groups+1):
                groupname = groupnames.get(i)
                if groupname not in replacements:
                    continue
    
                # keep the text between this match and the last
                chunks.append(text[lastindex:match.start(i)])
                # then instead of the captured text, insert the replacement text for this group
                chunks.append(replacements[groupname])
                lastindex = match.end(i)
            chunks.append(text[lastindex:])
            # join all the junks to obtain the final string with replacements
            return ''.join(chunks)
    
        # for each occurence call our custom replacement function
        return re.sub(pattern, repl, string)
    
    >>> replace_groups(pattern, s, {'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
    'monthday=aaa, month=bbb, year=ccc'