Search code examples
regexultraedit

Regex expression to back reference more than 9 values in a replace


I have a regex expression that traverses a string and pulls out 40 values, it looks sort if like the query below, but much larger and more complicated

est(.*)/test>test>(.*)<test><test>(.*)test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test>

My question is how do I use these expressions with the replace command when the number exceeds 9. It seems as if whenever I use \10 it returns the value for \1 and then appends a 0 to the end.

Any help would be much appreciated thanks :)

Also I am using UEStudio, but if a different program does it better then no biggie :)


Solution

  • Most of the simple Regex engines used by editors aren't equipped to handle more than 10 matching groups; it doesn't seem like UltraEdit can. I just tried Notepad++ and it won't even match a regex with 10 groups.

    Your best bet, I think, is to write something fast in a quick language with a decent regex parser. but that wouldn't answer the question as asked

    Here's something in Python:

    import re
    
    pattern = re.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
    with open('input.txt', 'r') as f:
        for line in f:
            m = pattern.match(line)
            print m.groups()
    

    Note that Python allows backreferences such as \20: in order to have a backreference to group 2 followed by a literal 0, you need to use \g<2>0, which is unambiguous.

    Edit: Most flavors of regex, and editors which include a regex engine, should follow the replace syntax as follows:

    abcdefghijklmnop
    search: (.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(?<name>.)(.)
    note:    1  2  3  4  5  6  7  8  9  10 11 12 13
    value:   a  b  c  d  e  f  g  h  i  j  k  l  m
    replace result:
        \11      k1      i.e.: match 1, then the character "1"
        ${12}    l       most should support this
        ${name}  l       few support named references, but use them where you can.
    

    Named references are usually only possible in very specific flavor of regex libraries, test your tool to know for sure.