Search code examples
elasticsearchlogstashlogstash-configurationlogstash-filter

how to place splitted array object into root level of elasticsearch doc using logstash split plugin


I wanna to put splitted array object to the root level of elasticsearch document using split filter plugin on logstash.

Here is my input source, current logstash configuration, output, and output that I wanna get.

[ INPUT source from my REST API server ]

{
  "rid": "dc755b8d-14bf-4211-a820-9aab01e5475b",
  "rval": {
    "totalCount": 3,
    "data": [
      {
        "id": 1,
        "name": "Object1",
        "category": "Object1",
        "sequence": 1
      },
      {
        "id": 2,
        "name": "Object2",
        "sequence": 2
      },
      {
        "id": 3,
        "name": "Obect3",
        "sequence": 3
      }
    ]
  }
}

[ Logstash Config ]

input {
  http_poller {
    urls => {
      test => {
        method => post
        url => "http://localhost/object"
        body => '{"category": "test"}'
        headers => {
          "Content-Type" => "application/json"
        }
      }
    }
    request_timeout => 60
    schedule => { cron => "* * * * * UTC" }
    codec => "json"
  }
}

filter {
  split {
    field => "[rval][data]"
    target => "test"
  }
  mutate {
    remove_field => [ "rid", "rval", "event", "success", "dateTime", "@version", "@timestamp" ]
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "test"
    # To identify documents, ID of each object must be set to document_id.
    document_id => "%{[test][id]}"
  }
  stdout {
    codec => rubydebug
  }
}

[ OUTPUT ]

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "test",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "test": {
            "id": 1,
            "name": "Object1",
            "category": "Object1",
            "sequence": 1
          }
        }
      },
      {
        "_index": "test",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "test": {
            "id": 2,
            "name": "Object2",
            "category": "Object2",
            "sequence": 2
          }
        }
      },
      {
        "_index": "test",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "test": {
            "id": 3,
            "name": "Object3",
            "category": "Object3",
            "sequence": 3
          }
        }
      }
    ]
  }
}

[ OUTPUT I wanna get ]

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "test",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "id": 1,
          "name": "Object1",
          "category": "Object1",
          "sequence": 1
        }
      },
      {
        "_index": "test",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "id": 2,
          "name": "Object2",
          "category": "Object2",
          "sequence": 2
        }
      },
      {
        "_index": "test",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "id": 3,
          "name": "Object3",
          "category": "Object3",
          "sequence": 3
        }
      }
    ]
  }
}

The difference is the structure under hits.hits[]._source

I want to put splitted array object into hits.hits[]._source without "test" key.

To get rid of the test key, when I clear the settings below,

target => "test"

field value(rval.data) is created as a key under hits.hits[]._source like below.

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "test",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "rval": {
            "totalCount": 3,
            "data": [
              {
                "id": 1,
                "name": "Object1",
                "category": "Object1",
                "sequence": 1
              },
              {
                "id": 2,
                "name": "Object2",
                "sequence": 2
              },
              {
                "id": 3,
                "name": "Obect3",
                "sequence": 3
              }
            ]
          }
        }
      }
    ]
  }
}

To get output that I expect, how should I modify logstash configuration?


Solution

  • What you could do is simply rename the fields

    filter {
      split {
        field => "[rval][data]"
        target => "test"
      }
      mutate {
        rename => {
           "[test][id]" => "id"
           "[test][name]" => "name"
           "[test][category]" => "category"
           "[test][sequence]" => "sequence"
        }
        remove_field => [ "test", "rid", "rval", "event", "success", "dateTime", "@version", "@timestamp" ]
      }
    }