I'm trying to extract source and destination MAC and IP addresses and packets transmitted from the output of the command "ovs dump-flows". The output of the command will be as follows
in_port(2),eth(src=00:26:55:e8:b0:43,dst=bc:30:5b:f7:07:fc),eth_type(0x0806),arp(sip=193.170.192.129,tip=193.170.192.142,op=2,sha=00:26:55:e8:b0:43,tha=bc:30:5b:f7:07:fc), packets:0, bytes:0, used:never, actions:1
in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45969,dst=5672), packets:1, bytes:87, used:4.040s, flags:P., actions:1
in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45992,dst=5672), packets:118412, bytes:21787661, used:2.168s, flags:P., actions:1
in_port(2),eth(src=00:18:6e:3a:aa:e8,dst=01:80:c2:00:00:00), packets:29131, bytes:1864384, used:1.200s, actions:drop
The code is
from pyparsing import *
import datetime,time
import os
f = os.popen('ovs-dpctl dump-flows ovs-system')
flows = f.read()
print "Flows are ", flows
LBRACE,RBRACE,COMMA,EQUAL,COLON = map(Suppress,'(),=:')
in_port = packets = proto = tos = ttl = src = dst = op = Word(nums)
ipAddress = Combine(Word(nums) + ('.' + Word(nums))*3)
twohex = Word(hexnums,exact=2)
macAddress = Combine(twohex + (':'+twohex)*5)
eth_type = Combine('0x' + Word(hexnums,exact=4))
frag = Word
flowTcp = "in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA + "eth" + LBRACE + "src" + EQUAL + macAddress("src") + COMMA + "dst" + EQUAL + macAddress("dst") + RBRACE + COMMA + "eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA + "ipv4" + LBRACE + "src" + EQUAL + ipAddress("src") + COMMA + "dst" + EQUAL + ipAddress("dst") + COMMA + "proto" + EQUAL + proto("proto") + COMMA + "tos" + EQUAL + tos("tos") + COMMA + "ttl" + EQUAL + ttl("ttl") + COMMA + "frag" + EQUAL + frag("frag") + RBRACE + COMMA + "tcp" + LBRACE + "src" + EQUAL + src("srcPkt") + COMMA + "dst" + EQUAL + dst("dstPkt") + RBRACE + "packets" + COLON + packets("packets")
Since the name representations for Mac address, IP address and packets is same as "src" and "dst". I'm not able to parse and extract the required data because of reoccuring names. Please suggest on how this can be done.
First I had to reformat your code, so that I could more easily see the structure in the parser:
flowTcp = ("in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA +
"eth" + LBRACE + "src" + EQUAL + macAddress("src") + COMMA +
"dst" + EQUAL + macAddress("dst") + RBRACE + COMMA +
"eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA +
"ipv4" + LBRACE + "src" + EQUAL + ipAddress("src") + COMMA +
"dst" + EQUAL + ipAddress("dst") + COMMA +
"proto" + EQUAL + proto("proto") + COMMA +
"tos" + EQUAL + tos("tos") + COMMA +
"ttl" + EQUAL + ttl("ttl") + COMMA +
"frag" + EQUAL + frag("frag") + RBRACE + COMMA +
"tcp" + LBRACE +
"src" + EQUAL + src("srcPkt") + COMMA +
"dst" + EQUAL + dst("dstPkt") +
RBRACE +
"packets" + COLON + packets("packets"))
Then, to parse the examples that you posted, I had to make some of these structures Optional, and add the missing "eth" and "arp" fields (and fix your definition of frag
):
frag = oneOf("yes no")
flowTcp = ("in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA +
"eth" + LBRACE +
"src" + EQUAL + macAddress("src") + COMMA +
"dst" + EQUAL + macAddress("dst") +
RBRACE + COMMA +
Optional("eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA) +
Optional("arp" + LBRACE +
"sip" + EQUAL + ipAddress("sip") + COMMA +
"tip" + EQUAL + ipAddress("tip") + COMMA +
"op" + EQUAL + op("op") + COMMA +
"sha" + EQUAL + macAddress("sha") + COMMA +
"tha" + EQUAL + macAddress("tha") +
RBRACE + COMMA) +
Optional("ipv4" + LBRACE +
"src" + EQUAL + ipAddress("src") + COMMA +
"dst" + EQUAL + ipAddress("dst") + COMMA +
"proto" + EQUAL + proto("proto") + COMMA +
"tos" + EQUAL + tos("tos") + COMMA +
"ttl" + EQUAL + ttl("ttl") + COMMA +
"frag" + EQUAL + frag("frag") +
RBRACE + COMMA) +
Optional("tcp" + LBRACE +
"src" + EQUAL + src("srcPkt") + COMMA +
"dst" + EQUAL + dst("dstPkt") +
RBRACE) +
"packets" + COLON + packets("packets"))
At this point, the parser "works", but it has the problem you have asked about, which is that you have repeated use of some results names, like "src", "dst", and so on.
Obviously, you could just use different names, like "eth_src", "tcp_src". But I suggest you use pyparsing Group
classes to add structure to your parsed data. I took each of the substructures out to define as their own mini-parser:
eth = Group("eth" + LBRACE +
"src" + EQUAL + macAddress("src") + COMMA +
"dst" + EQUAL + macAddress("dst") +
RBRACE)
arp = Group("arp" + LBRACE +
"sip" + EQUAL + ipAddress("sip") + COMMA +
"tip" + EQUAL + ipAddress("tip") + COMMA +
"op" + EQUAL + op("op") + COMMA +
"sha" + EQUAL + macAddress("sha") + COMMA +
"tha" + EQUAL + macAddress("tha") +
RBRACE)
ipv4 = Group("ipv4" + LBRACE + "src" + EQUAL + ipAddress("src") + COMMA +
"dst" + EQUAL + ipAddress("dst") + COMMA +
"proto" + EQUAL + proto("proto") + COMMA +
"tos" + EQUAL + tos("tos") + COMMA +
"ttl" + EQUAL + ttl("ttl") + COMMA +
"frag" + EQUAL + frag("frag") +
RBRACE)
tcp = Group("tcp" + LBRACE +
"src" + EQUAL + src("srcPkt") + COMMA +
"dst" + EQUAL + dst("dstPkt") +
RBRACE)
Then I added each one back into the main parser, and gave each group a results name.
flowTcp = ("in_port" + LBRACE + in_port("in_port") + RBRACE + COMMA +
eth("eth") + COMMA +
Optional("eth_type" + LBRACE + eth_type("eth_type") + RBRACE + COMMA ) +
Optional(arp("arp") + COMMA) +
Optional(ipv4("ipv4") + COMMA) +
Optional(tcp("tcp") + COMMA) +
"packets" + COLON + packets("packets"))
(So I did 2 things here - I Group
ed the substructures, and I gave them names. I could have inlined the whole thing without breaking out eth, arp, etc., but I was getting lost trying to keep it all in one mother-of-all-statements.)
Now I parsed your 4 examples, and dumped out the results. The dump() method will show you the structure in the output results, and the sample code shows how to access the sub-structures using normal attribute naming (like flowTcpValues.eth.src
).
for d in data:
print d
flowTcpValues = flowTcp.parseString(d)
print flowTcpValues.dump()
print flowTcpValues.packets
print flowTcpValues.eth.src
print flowTcpValues.eth.dst
print
Giving:
Flows are
in_port(2),eth(src=00:26:55:e8:b0:43,dst=bc:30:5b:f7:07:fc),eth_type(0x0806),arp(sip=193.170.192.129,tip=193.170.192.142,op=2,sha=00:26:55:e8:b0:43,tha=bc:30:5b:f7:07:fc), packets:0, bytes:0, used:never, actions:1
['in_port', '2', ['eth', 'src', '00:26:55:e8:b0:43', 'dst', 'bc:30:5b:f7:07:fc'], 'eth_type', '0x0806', ['arp', 'sip', '193.170.192.129', 'tip', '193.170.192.142', 'op', '2', 'sha', '00:26:55:e8:b0:43', 'tha', 'bc:30:5b:f7:07:fc'], 'packets', '0']
- arp: ['arp', 'sip', '193.170.192.129', 'tip', '193.170.192.142', 'op', '2', 'sha', '00:26:55:e8:b0:43', 'tha', 'bc:30:5b:f7:07:fc']
- op: 2
- sha: 00:26:55:e8:b0:43
- sip: 193.170.192.129
- tha: bc:30:5b:f7:07:fc
- tip: 193.170.192.142
- eth: ['eth', 'src', '00:26:55:e8:b0:43', 'dst', 'bc:30:5b:f7:07:fc']
- dst: bc:30:5b:f7:07:fc
- src: 00:26:55:e8:b0:43
- eth_type: 0x0806
- in_port: 2
- packets: 0
0
00:26:55:e8:b0:43
bc:30:5b:f7:07:fc
in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45969,dst=5672), packets:1, bytes:87, used:4.040s, flags:P., actions:1
['in_port', '2', ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc'], 'eth_type', '0x0800', ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no'], ['tcp', 'src', '45969', 'dst', '5672'], 'packets', '1']
- eth: ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc']
- dst: bc:30:5b:f7:07:fc
- src: bc:30:5b:f6:dd:fc
- eth_type: 0x0800
- in_port: 2
- ipv4: ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no']
- dst: 193.170.192.142
- frag: no
- proto: 6
- src: 193.170.192.143
- tos: 0
- ttl: 64
- packets: 1
- tcp: ['tcp', 'src', '45969', 'dst', '5672']
- dstPkt: 5672
- srcPkt: 45969
1
bc:30:5b:f6:dd:fc
bc:30:5b:f7:07:fc
in_port(2),eth(src=bc:30:5b:f6:dd:fc,dst=bc:30:5b:f7:07:fc),eth_type(0x0800),ipv4(src=193.170.192.143,dst=193.170.192.142,proto=6,tos=0,ttl=64,frag=no),tcp(src=45992,dst=5672), packets:118412, bytes:21787661, used:2.168s, flags:P., actions:1
['in_port', '2', ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc'], 'eth_type', '0x0800', ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no'], ['tcp', 'src', '45992', 'dst', '5672'], 'packets', '118412']
- eth: ['eth', 'src', 'bc:30:5b:f6:dd:fc', 'dst', 'bc:30:5b:f7:07:fc']
- dst: bc:30:5b:f7:07:fc
- src: bc:30:5b:f6:dd:fc
- eth_type: 0x0800
- in_port: 2
- ipv4: ['ipv4', 'src', '193.170.192.143', 'dst', '193.170.192.142', 'proto', '6', 'tos', '0', 'ttl', '64', 'frag', 'no']
- dst: 193.170.192.142
- frag: no
- proto: 6
- src: 193.170.192.143
- tos: 0
- ttl: 64
- packets: 118412
- tcp: ['tcp', 'src', '45992', 'dst', '5672']
- dstPkt: 5672
- srcPkt: 45992
118412
bc:30:5b:f6:dd:fc
bc:30:5b:f7:07:fc
in_port(2),eth(src=00:18:6e:3a:aa:e8,dst=01:80:c2:00:00:00), packets:29131, bytes:1864384, used:1.200s, actions:drop
['in_port', '2', ['eth', 'src', '00:18:6e:3a:aa:e8', 'dst', '01:80:c2:00:00:00'], 'packets', '29131']
- eth: ['eth', 'src', '00:18:6e:3a:aa:e8', 'dst', '01:80:c2:00:00:00']
- dst: 01:80:c2:00:00:00
- src: 00:18:6e:3a:aa:e8
- in_port: 2
- packets: 29131
29131
00:18:6e:3a:aa:e8
01:80:c2:00:00:00