Search code examples
pythonregexmultiline

Python multiline regex ignore n lines in string


I have a problem with writing correct regex. Maybe someone can help me?

I have output from two network devices:

1

VRF NAME1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Gi1/1/1                 Gi1/1/4

2

VRF NAME2 (VRF Id = 2); default RD 101:2; default VPNID <not set>
Interfaces:
Gi0/0/3                  Gi0/0/4                  Gi0/1/4

I need extract interface name from both.

I have regex:

 rx = re.compile("""
              VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
              ^.*$[\n\r]
              ^.*$[\n\r]
              ^.*$[\n\r]
              (^.*)
              """,re.MULTILINE|re.VERBOSE)

But it is only works for first text, it skips 4 lines and 5 line is exactly what I need. However there are many routers that returning output like 2. The question is how ignore unknown amount of line and for example find line with Interfaces word and extract next line after "Interfaces:"


Solution

  • EDIT: after providing us with more input, the answer is corrected.

    There are many ways to solve this. Look at regex101. The regex

    (?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)
    

    read in a complete record and captures the Name, RD value and line following Interfaces.

    Explanation:

    (?s)                           # single line mode: make "." read anything,
                                   # including line breaks
    VRF                            # every records start with VRF
    \s                             # read " "
    ([^\s]+)                       # group 1: capture NAME VRF
    \s                             # read " "
    .*?                            # lazy read anything
    (?:                            # start non-capture group
     RD\s                          # read "RD "
    (                              # group 2
      [\d.]+:\d                    # number or ip, followed by ":" and a digit
      |                            # OR
      <not\sset>                   # value "<not set>"
    )                              # group 2 end
    )                              # non-caputure group end
    ;                              # read ";"
    .*?                            # lazy read anything
    Interfaces:                    # read "Interfaces:"
    (?:\r*\n)                      # read newline
    \s*                            # read spaces
    (.*?)                          # group 3: read line after "Interfaces:"
    (?:\r*\n)                      # read newline
    

    Let's look at a test script. I've cut down on the length of the records in the script a bit, but the message still stands.

    $ cat test.py
    import os
    import re
    
    pattern = r"(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)"
    
    text = '''\
    VRF BLA1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
    Old CLI format, supports IPv4 only
    Flags: 0xC
    Interfaces:
      Gi1/1/1.451              Gi1/1/4.2019
    Address family ipv4 unicast (Table ID = 0x2):
      VRF label allocation mode: per-prefix
    Address family ipv6 unicast not active
    Address family ipv4 multicast not active
    
    VRF BLA2 (VRF Id = 1); default RD <not set>; default VPNID <not set>
    New CLI format, supports multiple address-families
    Flags: 0x1808
    Interfaces:
      Gi0
    Address family ipv4 unicast (Table ID = 0x1):
      Flags: 0x0
    Address family ipv6 unicast (Table ID = 0x1E000001):
      Flags: 0x0
    Address family ipv4 multicast not active\
    '''
    
    for rec in text.split( os.linesep + os.linesep):
        m = re.match(pattern, rec)
        if m:
            print("%s\tRD: %s\tInterfaces: %s" % (m.group(1), m.group(2), m.group(3)))
    

    which results in:

    $ python test.py
    BLA1    RD: 9200:1  Interfaces: Gi1/1/1.451              Gi1/1/4.2019
    BLA2    RD: <not set>   Interfaces: Gi0