Search code examples
parsingloggingluatext-parsinglpeg

Parsing out multiple lines with LPeg in Lua


I have some text file with multiple lines block like

2011/01/01 13:13:13,<AB>, Some Certain Text,=,
[    
certain text
         [
                  0: 0 0 0 0 0 0 0 0 
                  8: 0 0 0 0 0 0 0 0 
                 16: 0 0 0 9 343 3938 9433 8756 
                 24: 6270 4472 3182 2503 1768 1140 836 496 
                 32: 326 273 349 269 144 121 94 82 
                 40: 64 80 66 59 56 47 50 46 
                 48: 64 35 42 53 42 40 41 34 
                 56: 35 41 39 39 47 30 30 39 
                 Total count: 12345
        ]
    certain text
]
some text
2011/01/01 14:14:14,<AB>, Some Certain Text,=,
[
 certain text
   [
              0: 0 0 0 0 0 0 0 0 
              8: 0 0 0 0 0 0 0 0 
             16: 0 0 0 4 212 3079 8890 8941 
             24: 6177 4359 3625 2420 1639 974 594 438 
             32: 323 286 318 296 206 132 96 85 
             40: 65 73 62 53 47 55 49 52 
             48: 29 44 44 41 43 36 50 36 
             56: 40 30 29 40 35 30 25 31 
             64: 47 31 25 29 24 30 35 31 
             72: 28 31 17 37 35 30 20 33 
             80: 28 20 37 25 21 23 25 36 
             88: 27 35 22 23 15 24 34 28
             Total count: 123456 
    ]
    certain text
some text
]

Those variant-length blocks exist between text. I want to read out all numbers after : and keep them in individual arrays. In this case, there will be two arrays:

array1 = { 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 343 3938 9433 8756 6270 4472 3182 2503 1768 1140 836 496 326 273 349 269 144 121 94 82 64 80 66 59 56 47 50 46 64 35 42 53 42 40 41 34 35 41 39 39 47 30 30 39 12345 }

array2 = { 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 212 3079 8890 8941 6177 4359 3625 2420 1639 974 594 438 323 286 318 296 206 132 96 85 65 73 62 53 47 55 49 52 29 44 44 41 43 36 50 36 40 30 29 40 35 30 25 31 47 31 25 29 24 30 35 31 28 31 17 37 35 30 20 33 28 20 37 25 21 23 25 36 27 35 22 23 15 24 34 28 123456 }

I found lpeg may be a light-weighted way to achieve it. But I'm totally new to PEGs and LPeg. Please help!


Solution

  • LPEG version:

    local lpeg            = require "lpeg"
    local lpegmatch       = lpeg.match
    local C, Ct, P, R, S  = lpeg.C, lpeg.Ct, lpeg.P, lpeg.R, lpeg.S
    local Cg              = lpeg.Cg
    
    local data_to_arrays
    
    do
      local colon    = P":"
      local lbrak    = P"["
      local rbrak    = P"]"
      local digits   = R"09"^1
      local eol      = P"\n\r" + P"\r\n" + P"\n" + P"\r"
      local ws       = S" \t\v"
      local optws    = ws^0
      local getnum   = C(digits) / tonumber * optws
      local start    = lbrak * optws * eol
      local stop     = optws * rbrak
      local line     = optws * digits * colon * optws
                     * getnum * getnum * getnum * getnum
                     * getnum * getnum * getnum * getnum
                     * eol
      local count    = optws * P"Total count:" * optws * getnum * eol
      local inner    = Ct(line^1 * count^-1)
    --local inner    = Ct(line^1 * Cg(count, "count")^-1)
      local array    = start * inner * stop
      local extract  = Ct((array + 1)^0)
    
      data_to_arrays = function (data)
        return lpegmatch (extract, data)
      end
    end
    

    This actually works only if there are exactly eight integers on each line of the data block. Depending on how well formed your input is this may be a curse or a blessing ;-)

    And a test file:

    data = [[
    some text
    [    
    some text
             [
                      0: 0 0 0 0 0 0 0 0 
                      8: 0 0 0 0 0 0 0 0 
                     16: 0 0 0 9 343 3938 9433 8756 
                     24: 6270 4472 3182 2503 1768 1140 836 496 
                     32: 326 273 349 269 144 121 94 82 
                     40: 64 80 66 59 56 47 50 46 
                     48: 64 35 42 53 42 40 41 34 
                     56: 35 41 39 39 47 30 30 39 
                     Total count: 12345
            ]
        some text
    ]
    some text
    [
     some text
       [
                  0: 0 0 0 0 0 0 0 0 
                  8: 0 0 0 0 0 0 0 0 
                 16: 0 0 0 4 212 3079 8890 8941 
                 24: 6177 4359 3625 2420 1639 974 594 438 
                 32: 323 286 318 296 206 132 96 85 
                 40: 65 73 62 53 47 55 49 52 
                 48: 29 44 44 41 43 36 50 36 
                 56: 40 30 29 40 35 30 25 31 
                 64: 47 31 25 29 24 30 35 31 
                 72: 28 31 17 37 35 30 20 33 
                 80: 28 20 37 25 21 23 25 36 
                 88: 27 35 22 23 15 24 34 28 
        ]
        some text
    some text
    ]
    ]]
    
    local arrays = data_to_arrays (data)
    
    for n = 1, #arrays do
      local ar   = arrays[n]
      local size = #ar
      io.write (string.format ("[%d] = { --[[size: %d items]]\n  ", n, size))
      for i = 1, size do
        io.write (string.format ("%d,%s", ar[i], (i % 5 == 0) and "\n  " or " "))
      end
      if ar.count ~= nil then
        io.write (string.format ("\n  [\"count\"] = %d,", ar.count))
      end
      io.write (string.format ("\n}\n"))
    end