Search code examples
logstashlogstash-configuration

How to add array of dictionary via logstash filter mutate from csv?


I have written the logstash config file to upload a csv, csv has multiple applicant informations, I need to upload as array of dictionary in the kibana index instead of being a dictionary of dict with index.

filter {
    csv {
        separator => ","
        skip_header => true
        columns => [LoanID,Applicant_Income1,Occupation1,Time_At_Work1,Date_Of_Join1,Gender,LoanAmount,Marital_Status,Dependents,Education,Self_Employed,Applicant_Income2,Occupation2,Time_At_Work2,Date_Of_Join2,Applicant_Income3,Occupation3,Time_At_Work3,Date_Of_Join3]
    }
    mutate { 
        convert => {
            "Applicant_Income1" => "float"
            "Time_At_Work1" => "float"
            "LoanAmount" => "float"
            "Applicant_Income2" => "float"
            "Time_At_Work2" => "float"
            "Applicant_Income3" => "float"
            "Time_At_Work3" => "float"
            }
        } 
    mutate{
        rename => {
            "Applicant_Income1" => "[Applicant][0][Applicant_Income]"
            "Occupation1" => "[Applicant][0][Occupation]"
            "Time_At_Work1" => "[Applicant][0][Time_At_Work]"
            "Date_Of_Join1" => "[Applicant][0][Date_Of_Join]"
            "Applicant_Income2" => "[Applicant][1][Applicant_Income]"
            "Occupation2" => "[Applicant][1][Occupation]"
            "Time_At_Work2" => "[Applicant][1][Time_At_Work]"
            "Date_Of_Join2" => "[Applicant][1][Date_Of_Join]"
            "Applicant_Income3" => "[Applicant][2][Applicant_Income]"
            "Occupation3" => "[Applicant][2][Occupation]"
            "Time_At_Work3" => "[Applicant][2][Time_At_Work]"
            "Date_Of_Join3" => "[Applicant][2][Date_Of_Join]"
            }
        }   
    date {
        match => [ "Date_Of_Join1", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
        }   
    date {
        match => [ "Date_Of_Join2", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
      } 
    date {
        match => [ "Date_Of_Join3", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]      
      }       
    }

I got the Applicant field as

enter image description here

But I need the Applicant field to be an array of dictionaries, like

enter image description here

I tried add_field, but not working

    mutate{
        add_field => {  "[Applicant][Applicant_Income1]" => "Applicant_Income1",
                    "[Applicant][Occupation1]" => "Occupation1",
                "[Applicant][Time_At_Work1]" => "Time_At_Work1",
                "[Applicant][Date_Of_Join1]" => "Date_Of_Join1"
                        }
        }

Solution

  • The square brackets in Logstash Filters do not behave like array elements/entries as in other programming languages, e.g. Java.

    [Applicant][0][Applicant_Income]

    is not the right syntax to set the value of field Applicant_Income of the first element (zero-based index) in the Applicant-Array. Instead, you create sub-elements 0, 1, 2 underneath the Applicant-element as shown in Figure 1.

    To create an array of objects, you should use the ruby filter plugin (https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html). Since you can execute arbitrary ruby code with that filter, it gives you more control/freedom:

    filter {
      csv {
        separator => ","
        skip_header => true
        columns => [LoanID,Applicant_Income1,Occupation1,Time_At_Work1,Date_Of_Join1,Gender,LoanAmount,Marital_Status,Dependents,Education,Self_Employed,Applicant_Income2,Occupation2,Time_At_Work2,Date_Of_Join2,Applicant_Income3,Occupation3,Time_At_Work3,Date_Of_Join3]
      }
    
      mutate { 
        convert => {
          "Applicant_Income1" => "float"
          "Time_At_Work1" => "float"
          "LoanAmount" => "float"
          "Applicant_Income2" => "float"
          "Time_At_Work2" => "float"
          "Applicant_Income3" => "float"
          "Time_At_Work3" => "float"
        }
      } 
    
      ruby{
        code => '
          event.set("Applicant", 
           [
            {
             "Applicant_Income" => event.get("Applicant_Income1"),
             "Occupation" => event.get("Occupation1"), 
             "Time_At_Work" => event.get("Time_At_Work1"),
             "Date_Of_Join" => event.get("Date_Of_Join1")
            },
            {
               # next object...
            }
           ]
        '
      }
    
      date {
        match => [ "Date_Of_Join1", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
      } 
    
      date {
        match => [ "Date_Of_Join2", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
      } 
    
      date {
        match => [ "Date_Of_Join3", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ] 
      } 
    
      mutate{
        remove_field => [
          "Applicant_Income1",
          "Occupation1",
          "Time_At_Work1",
          "Date_Of_Join1",
          "Applicant_Income2",
          "Occupation2",
          "Time_At_Work2",
          "Date_Of_Join2",
          "Applicant_Income3",
          "Occupation3",
          "Time_At_Work3",
          "Date_Of_Join3"
        ]
      } 
    }
    

    With event.set you add a field to the document. The first argument is the fieldname, the second one its value. In this case, you add the field "Applicants" to the document with an array of objects as its value.

    event.get is used to get the value of a certain field in the document. You retrieve the value by passing the fieldname to the method.

    Please refer to this guide https://www.elastic.co/guide/en/logstash/current/event-api.html to get more insights of the event API.

    I hope I could help you.