Search code examples
elasticsearchlogstashlogstash-grok

Extracting data from text file and import to Elasticsearch using Logstash


I'm having a text file that I need to import into my Elasticsearch. My text file format is:

1            ARsv200711042           Allen                         Alane                         
2            ARsv200711042           Allen                         Arthur                        
3            ARsv200711042           Allen                         Bernice                       
4            ARsv200711042           Allen                         Betty                         
5            ARsv200711042           Allen                         Brittany                      
6            ARsv200711042           Allen                         Bruce                         
7            ARsv200711042           Allen                         Carolyn                       
8            ARsv200711042           Allen                         Carolyn                       
9            ARsv200711042           Allen                         Chadderick                    
10           ARsv200711042           Allen                         Darlene                        

I need to capture the data concerning the position; for example, the first column is eMID, which is from 1st position to 13th position, I've StateSource is at position 14-15, CodeProducts is at position 16-17, and so on.

So I made Logstash configuration something like this:

input {
    file {
        path => "D:/sample/sample 500.txt"
        start_position => "beginning"
    }
}

filter {
    grok {
        match => { 
            "message" => [
                "(?<eMID>.{0,13})(?<StateSource>.{0,2})(?<CodeProducts>.{0,2})(?<AcquiredDate>.{0,8})(?<Uses>.{0,2})(?<Prefix>.{0,10})(?<LName>.{0,30})(?<FName>.{0,30})"
            ]
        }
    }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "sample-data"
    #user => "elastic"
    #password => "changeme"
  }
}

I was able to import the data successfully. I've the following questions:

  • How to format the date field, for eg. I've acquired date in format 20071104 which needs to be transformed into date format, which elasticsearch can analyze
  • Since we are taking positions, there are possibilities that a lot of trailing whitespaces can appear, how to trim those whitespaces.
  • In some cases a few of the columns for eg. Firstname/FName or Lastname/LName may contain special characters such as + - && || ! ( ) { } [ ] ^ " ~ * ? : \ etc, how can I also match those with regex and insert into elasticsearch.

Solution

  • ok so one way is to split 20071104 into four parts \d{4} and assign this to y, and next two digits \d{2} to m and remaining two digits \d{2} to d and frame a date object

    or second way is to create a date from the string and using that object to reformat like in this example I did, assuming AcquiredDate is 20071104

    filter {
         
          ruby {
               code => '
                     date = Date.strptime(event.get("AcquiredDate"), "%Y%m%d")
                     event.set("new_time", date.strftime("%Y-%m-%d"))
               '
          }
          mutate {
             remove_field =>
                           ["host","@timestamp","sequence","message","@version"]
          }
     }
    

    gives you

    {
        "AcquiredDate" => "20071104",
        "new_time" => "2007-11-04"
    }
    

    to answer your second part

    use something like this

    mutate { 
      strip => ["field1withwhitespace", "field2withwhitespace"] 
    }