Search code examples
pythonregexstr-replacestring-matching

Regex to handle a varying number of variables


I'm trying to change a string that looks something like this:

s = 'g1 & g2 & (X~(~g1 & ~g2) & ~o1) & (XX~(~g1 & ~g2) & ~o1 & X~o1)'

to this:

'g1_0 & g2_0 & (~(~g1_1 & ~g2_1) & ~o1_0) & (~(~g1_2 & ~g2_2) & ~o1_0 & ~o1_1)'

So basically I'm appending to each variable _# (underscore and number) as the number of X's in front of it and remove the X's. The problem mostly arises when the X's are before parentheses and that I do not know a-priori how many variables and the logical operators that are in parentheses.

I try to do this in Python. I am going backwards from the most number of X's (Because, if I start looking for g1's, all of them will change). So this is the sequence:

import re
xs = 'X'*n
while n>0:
  # this is for when we have parentheses
  s = re.sub('%s([~]*)([(]+[~]*[a-zA-Z]+[0-9]+) ([&|]*) ([~]*[a-zA-Z]+[0-9]+)([)]+)'%xs, \
                          r'\1\2_%d \3 \4_%d\5'%(n,n), s)
  # this is for normal variables
  s = re.sub('%s([~]*[a-zA-Z]*[0-9]*)'%xs, r'\1_%d'%n, s) 
  xs = xs[:-1]
  n -= 1

And going down to no X's. The problem is that I don't want to impose the structure of 'o/g &/| o/g'. and I want it to be variable-length of names and operators, but still assign the correct names. E.g., to handle:

XX(~g1 & ~g2 | ~k3)  --> (~g1_2 & ~g2_2 | ~k3_2)

How can I do it with Regex?


Solution

  • You can use recursion with re:

    import re
    def rep_x(d, c = 0):
       s, f = '', 0
       while d:
          if d[0] == ')':
             return s+')', d[1:]
          if d[0] == '(':
             [_s, d], f = rep_x(d[1:], c = c+f), 0
             s += '('+_s
          elif (x:=re.findall('^X+', d)):
             d = d[(f:=len(x[0])):]
          elif (x:=re.findall('^\w+', d)):
             s, f, d = s + x[0]+'_'+str(f+c), 0, d[len(x[0]):]
          else:
             s, d = s+d[0], d[1:]
       return s, d
    
    r1, _ = rep_x('g1 & g2 & (X~(~g1 & ~g2) & ~o1) & (XX~(~g1 & ~g2) & ~o1 & X~o1)') 
    r2, _ = rep_x('XX(~g1 & ~g2 | ~k3)')          
    

    Output:

    'g1_0 & g2_0 & (~(~g1_1 & ~g2_1) & ~o1_0) & (~(~g1_2 & ~g2_2) & ~o1_0 & ~o1_1)'
    '(~g1_2 & ~g2_2 | ~k3_2)'