Search code examples
pythonpython-2.7pyparsingopenvswitch

extract data from ovs dump-flow using pyparsing


I'm trying to extract source and destination MAC and IP addresses and packets transmitted from the output of the command "ovs dump-flows". The output of the command will be as follows

in_port(2),eth(src=00:26:55:e8:b0:43,dst=bc:30:5b:f7:07:fc),eth_type(0x0806),arp(sip=193.170.192.129,tip=193.170.192.142,op=2,sha=00:26:55:e8:b0:43,tha=bc:30:5b:f7:07:fc), packets:0, bytes:0, used:never, actions:1
in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45969,dst=5672), packets:1, bytes:87, used:4.040s, flags:P., actions:1
in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45992,dst=5672), packets:118412, bytes:21787661, used:2.168s, flags:P., actions:1
in_port(2),eth(src=00:18:6e:3a:aa:e8,dst=01:80:c2:00:00:00), packets:29131, bytes:1864384, used:1.200s, actions:drop

The code is

from pyparsing import *
import datetime,time
import os
f = os.popen('ovs-dpctl dump-flows ovs-system')
flows = f.read()
print "Flows are ", flows

LBRACE,RBRACE,COMMA,EQUAL,COLON = map(Suppress,'(),=:')
in_port = packets = proto = tos = ttl = src = dst = op = Word(nums)
ipAddress = Combine(Word(nums) + ('.' + Word(nums))*3)
twohex = Word(hexnums,exact=2)
macAddress = Combine(twohex + (':'+twohex)*5)
eth_type = Combine('0x' + Word(hexnums,exact=4))
frag = Word

flowTcp = "in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA + "eth" + LBRACE + "src" + EQUAL + macAddress("src") + COMMA + "dst" + EQUAL + macAddress("dst") + RBRACE + COMMA + "eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA + "ipv4" + LBRACE + "src" + EQUAL + ipAddress("src") + COMMA + "dst" + EQUAL + ipAddress("dst") + COMMA + "proto" + EQUAL + proto("proto") + COMMA + "tos" + EQUAL + tos("tos") + COMMA + "ttl" + EQUAL + ttl("ttl") + COMMA + "frag" + EQUAL + frag("frag") + RBRACE + COMMA + "tcp" + LBRACE + "src" + EQUAL + src("srcPkt") + COMMA + "dst" + EQUAL + dst("dstPkt") + RBRACE + "packets" + COLON + packets("packets")

Since the name representations for Mac address, IP address and packets is same as "src" and "dst". I'm not able to parse and extract the required data because of reoccuring names. Please suggest on how this can be done.


Solution

  • First I had to reformat your code, so that I could more easily see the structure in the parser:

    flowTcp = ("in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA + 
                "eth" + LBRACE + "src" + EQUAL + macAddress("src") + COMMA + 
                "dst" + EQUAL + macAddress("dst") + RBRACE + COMMA + 
                "eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA + 
                "ipv4" + LBRACE + "src" + EQUAL + ipAddress("src") + COMMA + 
                    "dst" + EQUAL + ipAddress("dst") + COMMA + 
                    "proto" + EQUAL + proto("proto") + COMMA + 
                    "tos" + EQUAL + tos("tos") + COMMA + 
                    "ttl" + EQUAL + ttl("ttl") + COMMA + 
                    "frag" + EQUAL + frag("frag") + RBRACE + COMMA + 
                "tcp" + LBRACE + 
                    "src" + EQUAL + src("srcPkt") + COMMA + 
                    "dst" + EQUAL + dst("dstPkt") + 
                    RBRACE + 
                "packets" + COLON + packets("packets"))
    

    Then, to parse the examples that you posted, I had to make some of these structures Optional, and add the missing "eth" and "arp" fields (and fix your definition of frag):

    frag = oneOf("yes no")
    flowTcp = ("in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA + 
                "eth" + LBRACE + 
                    "src" + EQUAL + macAddress("src") + COMMA + 
                    "dst" + EQUAL + macAddress("dst") + 
                    RBRACE + COMMA + 
                Optional("eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA) +
                Optional("arp" + LBRACE +
                    "sip" + EQUAL + ipAddress("sip") + COMMA +
                    "tip" + EQUAL + ipAddress("tip") + COMMA +
                    "op" + EQUAL + op("op") + COMMA + 
                    "sha" + EQUAL + macAddress("sha") + COMMA + 
                    "tha" + EQUAL + macAddress("tha") + 
                    RBRACE + COMMA) +
                Optional("ipv4" + LBRACE + 
                    "src" + EQUAL + ipAddress("src") + COMMA + 
                    "dst" + EQUAL + ipAddress("dst") + COMMA + 
                    "proto" + EQUAL + proto("proto") + COMMA + 
                    "tos" + EQUAL + tos("tos") + COMMA + 
                    "ttl" + EQUAL + ttl("ttl") + COMMA + 
                    "frag" + EQUAL + frag("frag") + 
                    RBRACE + COMMA) +
                Optional("tcp" + LBRACE + 
                    "src" + EQUAL + src("srcPkt") + COMMA + 
                    "dst" + EQUAL + dst("dstPkt") + 
                    RBRACE) +
                "packets" + COLON + packets("packets"))
    

    At this point, the parser "works", but it has the problem you have asked about, which is that you have repeated use of some results names, like "src", "dst", and so on.

    Obviously, you could just use different names, like "eth_src", "tcp_src". But I suggest you use pyparsing Group classes to add structure to your parsed data. I took each of the substructures out to define as their own mini-parser:

    eth = Group("eth" + LBRACE + 
                    "src" + EQUAL + macAddress("src") + COMMA + 
                    "dst" + EQUAL + macAddress("dst") + 
                    RBRACE)
    arp = Group("arp" + LBRACE +
                    "sip" + EQUAL + ipAddress("sip") + COMMA +
                    "tip" + EQUAL + ipAddress("tip") + COMMA +
                    "op" + EQUAL + op("op") + COMMA + 
                    "sha" + EQUAL + macAddress("sha") + COMMA + 
                    "tha" + EQUAL + macAddress("tha") + 
                    RBRACE)
    ipv4 = Group("ipv4" + LBRACE + "src" + EQUAL + ipAddress("src") + COMMA + 
                    "dst" + EQUAL + ipAddress("dst") + COMMA + 
                    "proto" + EQUAL + proto("proto") + COMMA + 
                    "tos" + EQUAL + tos("tos") + COMMA + 
                    "ttl" + EQUAL + ttl("ttl") + COMMA + 
                    "frag" + EQUAL + frag("frag") + 
                    RBRACE)
    tcp = Group("tcp" + LBRACE + 
                    "src" + EQUAL + src("srcPkt") + COMMA + 
                    "dst" + EQUAL + dst("dstPkt") + 
                    RBRACE)
    

    Then I added each one back into the main parser, and gave each group a results name.

    flowTcp = ("in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA + 
                eth("eth") + COMMA + 
                Optional("eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA ) +
                Optional(arp("arp") + COMMA) +
                Optional(ipv4("ipv4") + COMMA) +
                Optional(tcp("tcp") + COMMA) +
                "packets" + COLON + packets("packets"))
    

    (So I did 2 things here - I Grouped the substructures, and I gave them names. I could have inlined the whole thing without breaking out eth, arp, etc., but I was getting lost trying to keep it all in one mother-of-all-statements.)

    Now I parsed your 4 examples, and dumped out the results. The dump() method will show you the structure in the output results, and the sample code shows how to access the sub-structures using normal attribute naming (like flowTcpValues.eth.src).

    for d in data:
        print d
        flowTcpValues = flowTcp.parseString(d)
        print flowTcpValues.dump()
        print flowTcpValues.packets
        print flowTcpValues.eth.src
        print flowTcpValues.eth.dst
        print
    

    Giving:

    Flows are  
    in_port(2),eth(src=00:26:55:e8:b0:43,dst=bc:30:5b:f7:07:fc),eth_type(0x0806),arp(sip=193.170.192.129,tip=193.170.192.142,op=2,sha=00:26:55:e8:b0:43,tha=bc:30:5b:f7:07:fc), packets:0, bytes:0, used:never, actions:1
    ['in_port', '2', ['eth', 'src', '00:26:55:e8:b0:43', 'dst', 'bc:30:5b:f7:07:fc'], 'eth_type', '0x0806', ['arp', 'sip', '193.170.192.129', 'tip', '193.170.192.142', 'op', '2', 'sha', '00:26:55:e8:b0:43', 'tha', 'bc:30:5b:f7:07:fc'], 'packets', '0']
    - arp: ['arp', 'sip', '193.170.192.129', 'tip', '193.170.192.142', 'op', '2', 'sha', '00:26:55:e8:b0:43', 'tha', 'bc:30:5b:f7:07:fc']
      - op: 2
      - sha: 00:26:55:e8:b0:43
      - sip: 193.170.192.129
      - tha: bc:30:5b:f7:07:fc
      - tip: 193.170.192.142
    - eth: ['eth', 'src', '00:26:55:e8:b0:43', 'dst', 'bc:30:5b:f7:07:fc']
      - dst: bc:30:5b:f7:07:fc
      - src: 00:26:55:e8:b0:43
    - eth_type: 0x0806
    - in_port: 2
    - packets: 0
    0
    00:26:55:e8:b0:43
    bc:30:5b:f7:07:fc
    
    in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45969,dst=5672), packets:1, bytes:87, used:4.040s, flags:P., actions:1
    ['in_port', '2', ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc'], 'eth_type', '0x0800', ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no'], ['tcp', 'src', '45969', 'dst', '5672'], 'packets', '1']
    - eth: ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc']
      - dst: bc:30:5b:f7:07:fc
      - src: bc:30:5b:f6:dd:fc
    - eth_type: 0x0800
    - in_port: 2
    - ipv4: ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no']
      - dst: 193.170.192.142
      - frag: no
      - proto: 6
      - src: 193.170.192.143
      - tos: 0
      - ttl: 64
    - packets: 1
    - tcp: ['tcp', 'src', '45969', 'dst', '5672']
      - dstPkt: 5672
      - srcPkt: 45969
    1
    bc:30:5b:f6:dd:fc
    bc:30:5b:f7:07:fc
    
    in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45992,dst=5672), packets:118412, bytes:21787661, used:2.168s, flags:P., actions:1
    ['in_port', '2', ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc'], 'eth_type', '0x0800', ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no'], ['tcp', 'src', '45992', 'dst', '5672'], 'packets', '118412']
    - eth: ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc']
      - dst: bc:30:5b:f7:07:fc
      - src: bc:30:5b:f6:dd:fc
    - eth_type: 0x0800
    - in_port: 2
    - ipv4: ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no']
      - dst: 193.170.192.142
      - frag: no
      - proto: 6
      - src: 193.170.192.143
      - tos: 0
      - ttl: 64
    - packets: 118412
    - tcp: ['tcp', 'src', '45992', 'dst', '5672']
      - dstPkt: 5672
      - srcPkt: 45992
    118412
    bc:30:5b:f6:dd:fc
    bc:30:5b:f7:07:fc
    
    in_port(2),eth(src=00:18:6e:3a:aa:e8,dst=01:80:c2:00:00:00), packets:29131, bytes:1864384, used:1.200s, actions:drop
    ['in_port', '2', ['eth', 'src', '00:18:6e:3a:aa:e8', 'dst', '01:80:c2:00:00:00'], 'packets', '29131']
    - eth: ['eth', 'src', '00:18:6e:3a:aa:e8', 'dst', '01:80:c2:00:00:00']
      - dst: 01:80:c2:00:00:00
      - src: 00:18:6e:3a:aa:e8
    - in_port: 2
    - packets: 29131
    29131
    00:18:6e:3a:aa:e8
    01:80:c2:00:00:00