Search code examples
logstashlogstash-grok

Logstash grok filter help - fixed position file


I have a fixed position (column) file, where there is no delimiter which separates the fields. Each field has its own start position and length. Here is the example of the data:

520140914191193386---------7661705508623855646---1595852965---133437--the lazy fox jumping over-----------------------212.75.12.85---

While I used dashes (-) to show the sample of the data above, the actual file contains spaces if the actual field is shorter than allowed in schema.

The schema in this case is:

UsedID (start position 1, length 27)
SystemID (start position 28, length 22)
SampleID (start position 50, length 13)
LineID (start position 63, length 8)
Text (start position 71, length 48)
IP (start position119, length 15)

Ideally, I would get the following field values in logstash (without trailing spaces)

UsedID:520140914191193386
SystemID:7661705508623855646
SampleID:1595852965
LineID:133437
Text:the lazy fox jumping over
IP:212.75.12.85

How do I parse this kind of file with grok?


Solution

  • I'd go for a two-step process:

    • Split data into fields
    • Strip empty data from end of each field

    Since each field has a known length, you can use a regex pattern like .{27} to match them.

    In grok, you can name a field like so: (?<user_id>.{27})

    You can test a full pattern in the grok debugger, but something like this should achieve a length-based split:

    (?<user_id>.{27})(?<system_id>.{22})(?<sample_id>.{13})(?<line_id>.{8})(?<text>.{48})(?<ip>.{15})
    

    You mentioned that your extra characters are all whitespace, so you can clean that up using the mutate filter with a strip option.

    All together, that might look something like this:

    filter {
        grok {
            match => ["message", "(?<user_id>.{27})(?<system_id>.{22})(?<sample_id>.{13})(?<line_id>.{8})(?<text>.{48})(?<ip>.{15})"]
        }
    
        mutate {
            strip => [
                "user_id",
                "system_id",
                "sample_id",
                "line_id",
                "text",
                "ip"
            ]
        }
    }