I need to parse some load balancer configuration section. It's seemingly simple (at least for a human).
Config consists of several objects with their content in curly braces like so:
ltm rule ssl-header-insert {
when HTTP_REQUEST {
HTTP::header insert "X-SSL-Connection" "yes"
}
}
ltm rule some_redirect {
priority 1
when HTTP_REQUEST {
if { (not [class match [IP::remote_addr] equals addresses_group ]) }
{
HTTP::redirect "http://some.page.example.com"
TCP::close
event disable all
}
}
The contents of each section/object is a TCL code so there will be nested curly braces. What I want to achieve is to parse this in pairs as: object identifier (after ltm rule
keywords) and it's contents (tcl code within braces) as it is.
I've looked around some examples and experimented a lot, but it's really giving me a hard time. I did some debugging within pyparsing (which is a bit confusing to me too) and I think that I'm failing to detect closing braces somehow, but can't figure that out.
What I came up with so far:
from pyparsing import *
import json
list_sample = """ltm rule ssl-header-insert {
when HTTP_REQUEST {
HTTP::header insert "X-SSL-Connection" "yes"
}
}
ltm rule some_redirect {
priority 1
when HTTP_REQUEST {
if { (not [class match [IP::remote_addr] equals addresses_group ]) }
{
HTTP::redirect "http://some.page.example.com"
TCP::close
event disable all
}
}
}
ltm rule http_header_replace {
when HTTP_REQUEST {
HTTP::header replace Host some.host.example.com
}
}"""
ParserElement.defaultWhitespaceChars=(" \t")
NL = LineEnd()
END = StringEnd()
LBRACE, RBRACE = map(Suppress, '{}')
ANY_HEADER = Suppress("ltm rule ") + Word(alphas, alphanums + "_-")
END_MARK = Literal("ltm rule")
CONTENT_LINE = (~ANY_HEADER + (NotAny(RBRACE + FollowedBy(END_MARK)) + ~END + restOfLine) | (~ANY_HEADER + NotAny(RBRACE + FollowedBy(END)) + ~END + restOfLine)) | (~RBRACE + ~END + restOfLine)
ANY_HEADER.setName("HEADER").setDebug()
LBRACE.setName("LBRACE").setDebug()
RBRACE.setName("RBRACE").setDebug()
CONTENT_LINE.setName("LINE").setDebug()
template_defn = ZeroOrMore((ANY_HEADER + LBRACE +
Group(ZeroOrMore(CONTENT_LINE)) +
RBRACE))
template_defn.ignore(NL)
results = template_defn.parseString(list_sample).asList()
print("Raw print:")
print(results)
print("----------------------------------------------")
print("JSON pretty dump:")
print json.dumps(results, indent=2)
I see in the debug that some of the matches work but in the end it fails with an empty list as a result.
On a sidenote - my CONTENT_LINE
part of the grammar is probably overly complicated in general, but I didn't find any simpler way to cover it so far.
The next thing would be to figure out how to preserve new lines and tabs in content part, since I need that to be unchanged in the output. But looks like I have to use ignore()
function - which is skipping new lines - to parse the multiline text in the first place, so that's another challenge.
I'd be grateful for someone to help me find out what the issues are. Or maybe I should take some other approach?
I think nestedExpr('{', '}')
will help. That will take care of the nested '{}'s, and wrapping in originalTextFor
will preserve newlines and spaces.
import pyparsing as pp
LTM, RULE = map(pp.Keyword, "ltm rule".split())
ident = pp.Word(pp.alphas, pp.alphanums+'-_')
ltm_rule_expr = pp.Group(LTM + RULE
+ ident('name')
+ pp.originalTextFor(pp.nestedExpr('{', '}'))('body'))
Using your sample string (after adding missing trailing '}'):
for rule, _, _ in ltm_rule_expr.scanString(sample):
print(rule[0].name, rule[0].body.splitlines()[0:2])
gives
ssl-header-insert ['{', ' when HTTP_REQUEST {']
some_redirect ['{', ' priority 1']
dump()
is also a good way to list out the contents of a returned ParseResults:
for rule, _, _ in ltm_rule_expr.scanString(sample):
print(rule[0].dump())
print()
['ltm', 'rule', 'ssl-header-insert', '{\n when HTTP_REQUEST {\n HTTP::header insert "X-SSL-Connection" "yes"\n}\n}']
- body: '{\n when HTTP_REQUEST {\n HTTP::header insert "X-SSL-Connection" "yes"\n}\n}'
- name: 'ssl-header-insert'
['ltm', 'rule', 'some_redirect', '{\n priority 1\n\nwhen HTTP_REQUEST {\n\n if { (not [class match [IP::remote_addr] equals addresses_group ]) }\n {\n HTTP::redirect "http://some.page.example.com"\n TCP::close\n event disable all\n }\n}}']
- body: '{\n priority 1\n\nwhen HTTP_REQUEST {\n\n if { (not [class match [IP::remote_addr] equals addresses_group ]) }\n {\n HTTP::redirect "http://some.page.example.com"\n TCP::close\n event disable all\n }\n}}'
- name: 'some_redirect'
Note that I broke up 'ltm'
and 'rule'
into separate keyword expressions. This guards against the case where a developer may have written valid code as ltm rule blah
, with > 1 space between "ltm" and "rule". This kind of thing happens all the time, you never know where whitespace will crop up.