Search code examples
pythonregexmultiline

How to parse Multiline block text if content differs from block to block using Python & regex?


I have a configuration file that I need to parse, the idea is putting it in a dictionary at a later stage thanks to the groupins in python.

The problem I'm facing is that not all lines in every block of text are exactly the same, my regex worked so far for the block with the most lines, but only matches on that single block of course. How do I multiline match if some "set" lines are ommited in some blocks for instances.

  • Do I need to break up the regex and use if, elsif, true/false statements to work through this ? Does not seem pythonic imho.

  • Im quite sure I'm goint to have to breakup my big regex and work through it sequentially ? if true then... else skip to next regex matching line.

  • Was thinking of putting every block from edit to next into a list element to be parsed seperately ? Or can I just do the whole thing in one go ?

I have some idea's but I would like som pythonic way of doing it please.

As always, your help is much appreciated. Thank you

TEXT, where block to match on is from edit to next. Not every block contains the same "set" statements :

edit "port11"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set device-identification enable
    set snmp-index 26
next
edit "port21"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set snmp-index 27
next
edit "port28"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set snmp-index 28
next
edit "port29"
    set vdom "ACME_Prod"
    set ip 174.244.244.244 255.255.255.224
    set allowaccess ping
    set vlanforward enable
    set type physical
    set alias "Internet-IRISnet"
    set snmp-index 29
next
edit "port20"
    set vdom "root"
    set ip 192.168.1.1 255.255.255.0
    set allowaccess ping https ssh snmp fgfm
    set vlanforward enable
    set type physical
    set snmp-index 39
next
edit "port25"
    set vdom "root"
    set allowaccess fgfm
    set vlanforward enable
    set type physical
    set snmp-index 40
next

CODE SNIPPET :

import re, pprint
file = "interfaces_2016_10_12.conf"

try:
    """
    fileopen = open(file, 'r')
    output = open('output.txt', 'w+')
except:
    exit("Input file does not exist, exiting script.")

#read whole config in 1 go instead of iterating line by line
text = fileopen.read()   

# my verbose regex, verbose so it is more readable !

pattern = r'''^                 # use r for multiline usage
\s+edit\s\"(.*)\"\n           # group(1) match int name
\s+set\svdom\s\"(.*)\"\n      # group(2) match vdom name
\s+set\sip\s(.*)\n            # group(3) match interface ip
\s+set\sallowaccess\s(.*)\n   # group(4) match allowaccess
\s+set\svlanforward\s(.*)\n   # group(5) match vlanforward
\s+set\stype\s(.*)\n          # group(6) match type
\s+set\salias\s\"(.*)\"\n     # group(7) match alias
\s+set\ssnmp-index\s\d{1,3}\n # match snmp-index but we don't need it
\s+next$'''                   # match end of config block

regexp = re.compile(pattern, re.VERBOSE | re.MULTILINE)

For multiline regex matching use finditer(): 
"""
z = 1
for match in regexp.finditer(text):
    while z < 8:
        print match.group(z)
        z += 1

fileopen.close()  #always close file
output.close() #always close file

Solution

  • Why use regex when it seems a pretty simple structure to parse:

    data = {}
    with open(file, 'r') as fileopen:
        for line in fileopen:
            words = line.strip().split()
            if words[0] == 'edit':  # Create a new block
                curr = data.setdefault(words[1].strip('"'), {})
            elif words[0] == 'set': # Write config to block
                curr[words[1]] = words[2].strip('"') if len(words) == 3 else words[2:]
    print(data)
    

    Output:

    {'port11': {'device-identification': 'enable',
      'snmp-index': '26',
      'type': 'physical',
      'vdom': 'ACME_Prod',
      'vlanforward': 'enable'},
     'port20': {'allowaccess': ['ping', 'https', 'ssh', 'snmp', 'fgfm'],
      'ip': ['192.168.1.1', '255.255.255.0'],
      'snmp-index': '39',
      'type': 'physical',
      'vdom': 'root',
      'vlanforward': 'enable'},
      ...