I am trying to parse a text file and converting it to a table (or JSON) using lua. Example test file is as follows:
ipv4 2 tcp 6 3598 ESTABLISHED src=192.168.1.117 dst=137.194.2.78 sport=59078 dport=80 packets=4 bytes=298 src=137.194.2.78 dst=132.227.127.212 sport=80 dport=59078 packets=3 bytes=567 [ASSURED] mark=0 use=2
ipv4 2 udp 17 55 src=192.168.1.117 dst=157.56.149.60 sport=49991 dport=3544 packets=5 bytes=445 [UNREPLIED] src=157.56.149.60 dst=132.227.127.212 sport=3544 dport=49991 packets=0 bytes=0 mark=0 use=2
ipv4 2 tcp 6 3420 ESTABLISHED src=192.168.1.104 dst=193.51.224.187 sport=35918 dport=443 packets=19 bytes=2521 src=193.51.224.187 dst=132.227.127.212 sport=443 dport=35918 packets=16 bytes=9895 [ASSURED] mark=0 use=2
ipv4 2 udp 17 59 src=192.168.1.117 dst=192.168.1.255 sport=17500 dport=17500 packets=139 bytes=23908 [UNREPLIED] src=192.168.1.255 dst=192.168.1.117 sport=17500 dport=17500 packets=0 bytes=0 mark=0 use=2
...
Notice that the data in each line can be split into two parts based on the direction (forward and reverse path flows).
In case you have a linux system/openwrt router, you may get a similar test file using the conntrack
command, or by reading /proc/net/nf_conntrack
.
What I wish to retrieve is the following information:
{ 1:
{
"bytes": 298,
"src": "192.168.1.117",
"sport": 59078,
"layer4": "tcp",
"dst": "137.194.2.78",
"dport": 80,
"layer3": "ipv4",
"packets": 4,
"rbytes": 567,
"rpackets": 3
},
{ 2: ...
where rbytes, rpackets are for the bytes and packets in the reverse direction (second half of line 1 in my example text file).
My parser is as follows:*
function conntrack(callback)
local connt = {}
if io.open("conntrack.temp", "r") then
for line in io.lines("conntrack.temp") do
line = line:match("^(.-( [^ =]+=).-)%2")
local entry, flags = _parse_mixed_record(line, " +")
if flags[6] ~= "TIME_WAIT" then
entry.layer3 = flags[1]
entry.layer4 = flags[3]
for i=1, #entry do
entry[i] = nil
end
if callback then
callback(entry)
else
connt[#connt+1] = entry
end
end
end
else
return nil
end
return connt
end
function _parse_mixed_record(cnt, delimiter)
delimiter = delimiter or " "
local data = {}
local flags = {}
for i, l in pairs(cnt:split("\n")) do
for j, f in pairs(l:split(delimiter)) do
local k, x, v = f:match('([^%s][^:=]*) *([:=]*) *"*([^\n"]*)"*')
if k then
if x == "" then
table.insert(flags, k)
else
data[k] = v
end
end
end
end
return data, flags
end
Calling the above function (after including a simple split
method in the code), I can parse the file only upto the first half of each line. So basically, no rbytes
or rpackets
are parsed. I know the code responsible for this is
line = line:match("^(.-( [^ =]+=).-)%2")
A print(line)
statement following this line in code shows me:
ipv4 2 tcp 6 3598 ESTABLISHED src=192.168.1.117 dst=137.194.2.78 sport=59078 dport=80 packets=4 bytes=298
So, the statement splits each line of the file using a confusing pattern matching which I kind of understand after playing around with it a bit. The part I still don't get is the %2
which occurs after capturing the pattern. I know it is used to somehow access the pattern caught, but how should I change this statement so that line
contains both the forward path bytes and packet count, as well as the reverse path?
My main question is: what exactly is the pattern in this statement? I'm probably going to remove this line to parse the whole statement, but I wanted to understand why the original coders are doing this.
I've been through the lua pattern matching manual but I'm still confused on capturing output with %<some_number>
. Why doesn't %1
or %3
work?
Two relevant stackoverflow questions I found: Q1, Q2. A deeper explanation would be appreciated.
Also, currently I can't recover the timeout value (5th word in line1 3598
) or the connection state (ESTABLISHED
, [ASSURED]
) with the code I have provided here. I'm still a beginner at lua and hope to crack this soon.
*NOTE: This parser is my fixed version of the one available in the luci sys module on openwrt routers. See original luci.sys sourcecode for details.
While working with on attitude adjustment 12.09, I noticed that their net.conntrack() isn't working due to a failure in parsing the object to a proper JSON format. The relevant function using this pattern is given in the sys.lua file, called function conntrack(callback) and internal function _parse_mixed_record(cnt, delimiter). My router used luci-0.11 and lua 5.1.4.
That pattern was designed to keep only the forward part of each line. Here's how it does that. The second parenthesis, ( [^ =]+=)
, captures the first substring of the form " stuff="
. Then the %2
at the end of the pattern will only match if that same, string, " stuff="
appears again. So on a line like
ipv4 2 tcp 6 3598 ESTABLISHED src=192.168.1.117 dst=137.194.2.78 sport=59078 dport=80 packets=4 bytes=298 src=137.194.2.78 dst=132.227.127.212 sport=80 dport=59078 packets=3 bytes=567 [ASSURED] mark=0 use=2
the second capture will be " src="
, so the first capture, which is what is assigned to line
, will be the whole initial portion of the line until just before the second time src=
appears, that is, this initial part:
ipv4 2 tcp 6 3598 ESTABLISHED src=192.168.1.117 dst=137.194.2.78 sport=59078 dport=80 packets=4 bytes=298
If you wanted to get the second half too, and assign it to a different variable, you could replace the line = ...
statement with
line1, _, line2 = line:match("^(.-( [^ =]+=).-)(%2.*)$")
This would assign to line1 the first half of the line (as was previosly assigned to line), and to line2, the remainder, sarting from the second occurence of " src="
. For the example line above, you'd get
line1 = "ipv4 2 tcp 6 3598 ESTABLISHED src=192.168.1.117 dst=137.194.2.78 sport=59078 dport=80 packets=4 bytes=298"
line2 = " src=137.194.2.78 dst=132.227.127.212 sport=80 dport=59078 packets=3 bytes=567 [ASSURED] mark=0 use=2"
Note: The _
in between line1
and line2
is there to catch the second capture (which here is the string " src="
), remember that match returns all captures, in order, whether you want them or not.