How to parse a list of lines into a single group based on the line prefix using pyparsing

I am trying to parse the output of the command ip netns exec vpn_ns ipsec stroke statusall (example pasted below).

The command provides multiple lines for each service (oof-#n-#i) terminator (#n) and instance using that terminator (#i), so

oof-2-1 is terminator server oof-2 instance 1.

How do I declare a match that collects all the lines prefixed by the same id?

From the example I am trying to get to something like this dict:

results = {
    'connections':
        {
            'oof-1-1': [ 3 lines starting with oof-1-1 in section "Connections" ],
            'oof-1-2': [ 3 lines starting with oof-1-2 in section "Connections" ]
            'oof-2-1': [ 3 lines starting with oof-2-1 in section "Connections" ]
        },

    'sec_assocs':
        {
            'oof-1-1': [ 3 lines starting with oof-1-1 in section "Security Associations" ],
            'oof-1-2': [ 3 lines starting with oof-1-2 in section "Security Associations" ]
            'oof-2-1': [ 3 lines starting with oof-2-1 in section "Security Associations" ]
        }
}

Where each id contains a list of the lines that start with it.

This is the full output from the StrongSwan command.

sample = """
Status of IKE charon daemon (strongSwan 5.9.1, Linux 4.15.0-162-generic, x86_64):
  uptime: 25 hours, since Mar 23 15:23:53 2022
  worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 10
  loaded plugins: charon aesni 
Listening IP addresses:
  169.254.123.2
  192.168.51.254
Connections:
     oof-1-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-1:   remote: [server] uses public key authentication
     oof-1-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-1-2:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-2:   remote: [server] uses public key authentication
     oof-1-2:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-2-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-2-1:   remote: [server] uses public key authentication
     oof-2-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restartd
Security Associations:
     oof-1-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-1:   remote: [server] uses public key authentication
     oof-1-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-1-2:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-2:   remote: [server] uses public key authentication
     oof-1-2:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-2-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-2-1:   remote: [server] uses public key authentication
     oof-2-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restartd
"""

And this is the sample that is used in the parsing solution:

sample = """
Connections:
     oof-1-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-1:   remote: [server] uses public key authentication
     oof-1-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-1-2:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-2:   remote: [server] uses public key authentication
     oof-1-2:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-2-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-2-1:   remote: [server] uses public key authentication
     oof-2-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restartd
Security Associations:
     oof-1-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-1:   remote: [server] uses public key authentication
     oof-1-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-1-2:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-1-2:   remote: [server] uses public key authentication
     oof-1-2:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart
     oof-2-1:  %any...10.1.0.242  IKEv2, dpddelay=30s
     oof-2-1:   remote: [server] uses public key authentication
     oof-2-1:   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restartd
"""

Solution

Post-processing is the most direct way to go with this kind of handling of the parsed data. Here is the BNF for the structuring you are trying to parse:

group ::= label ':' line...
label ::= word...
line ::= prefix ':' rest_of_line
prefix ::= word '-' int '-' int

where word and int are just a Word of alphas or nums, and '...' indicates repetition.

This translates to pyparsing as:

import pyparsing as pp

COLON = pp.Suppress(":")
label = pp.Combine(
            pp.Word(pp.alphas)[1, ...], adjacent=False, joinString=" "
            )
prefix = pp.Combine(
            pp.Word(pp.alphas) + "-" + pp.Word(pp.nums) + "-" + pp.Word(pp.nums)
            )
post_prefix = COLON + pp.restOfLine
line = pp.Group(prefix("prefix") + post_prefix)
lines = pp.Group(line[...])
group = pp.Group(label("group_label") + COLON + lines("subgroups"))

Pyparsing will generate this railroad diagram for you:

This parses your text, but to regroup the lines by their prefixes, we can add a parse action that uses itertools.groupby:

def regroup_lines(t):
    from itertools import groupby
    from operator import itemgetter

    ret = pp.ParseResults([])
    parsed_lines = t[0]
    for prefix, subgroup in groupby(parsed_lines, key=itemgetter("prefix")):
        # each line in subgroup has the prefix and the rest of the line after the ':'
        # repackage the multiple lines into a single group that is labeled with 
        # the common prefix, and contains the line contents
        ret.append(pp.ParseResults.from_dict(
            {
                'prefix': prefix,
                'lines': [line[1] for line in subgroup],
            }
        ))
    return ret

lines.add_parse_action(regroup_lines)

By using a parse action, the regrouping is done at parse time, so no additional post-parsing processing is needed.

Now we can parse your sample and get the regrouped results:

results = group[...].parseString(sample)

Here is a short function to print out the parsed groups:

def print_groups(parsed):
    for group in parsed:
        print(group.group_label)
        for subgroup in group.subgroups:
            print(f"- {subgroup.prefix}")
            for line in subgroup.lines:
                print(f"  {line!r}")
        print()

print_groups(results)

Which gives:

Connections
- oof-1-1
  '  %any...10.1.0.242  IKEv2, dpddelay=30s'
  '   remote: [server] uses public key authentication'
  '   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart'
- oof-1-2
  '  %any...10.1.0.242  IKEv2, dpddelay=30s'
  '   remote: [server] uses public key authentication'
  '   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart'
- oof-2-1
  '  %any...10.1.0.242  IKEv2, dpddelay=30s'
  '   remote: [server] uses public key authentication'
  '   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restartd'

Security Associations
- oof-1-1
  '  %any...10.1.0.242  IKEv2, dpddelay=30s'
  '   remote: [server] uses public key authentication'
  '   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart'
- oof-1-2
  '  %any...10.1.0.242  IKEv2, dpddelay=30s'
  '   remote: [server] uses public key authentication'
  '   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restart'
- oof-2-1
  '  %any...10.1.0.242  IKEv2, dpddelay=30s'
  '   remote: [server] uses public key authentication'
  '   child:  dynamic === 0.0.0.0/0 TUNNEL, dpdaction=restartd'

Here is the full source for the working example:

import pyparsing as pp

COLON = pp.Suppress(":")
label = pp.Combine(pp.Word(pp.alphas)[1, ...], adjacent=False, joinString=" ")
label.setName("label")
prefix = pp.Combine(pp.Word(pp.alphas) + "-" + pp.Word(pp.nums) + "-" + pp.Word(pp.nums))
prefix.setName("prefix")
post_prefix = COLON + pp.restOfLine
line = pp.Group(prefix("prefix") + post_prefix)
lines = pp.Group(line[...])


def regroup_lines(t):
    from itertools import groupby
    from operator import itemgetter

    ret = pp.ParseResults([])
    for prefix, subgroup in groupby(t[0], key=itemgetter("prefix")):
        ret.append(pp.ParseResults.from_dict(
            {
                'prefix': prefix,
                'lines': [line[1] for line in subgroup],
            }
        ))
    return ret
lines.add_parse_action(regroup_lines)

group = pp.Group(label("group_label") + COLON + lines("subgroups"))
pp.autoname_elements()
group.create_diagram("groupby_1.html", show_results_names=True)
results = group[...].parseString(sample)


def print_groups(parsed):
    for group in parsed:
        print(group.group_label)
        for subgroup in group.subgroups:
            print(f"- {subgroup.prefix}")
            for line in subgroup.lines:
                print(f"  {line!r}")
        print()

print_groups(results)