Search code examples
elasticsearchgeoipupdate-by-query

Add geoIP data to old data from Elasticsearch index


I recently added a GeoIP processor to my ingestion pipeline in Elasticsearch. this works well and adds new fields to the newly ingested documents. I wanted to add the GeoIP fields to older data by doing an _update_by_query on an index, however, it seems that it doesn't accept "processors" as a parameter.

What I want to do is something like this:

POST my_index*/_update_by_query
{
 "refresh": true,
 "processors": [
   {
     "geoip" : {
        "field": "doc['client_ip']",
        "target_field" : "geo",
        "database_file" : "GeoLite2-City.mmdb",
        "properties":["continent_name", "country_iso_code", "country_name", "city_name", "timezone", "location"]
    }
   }
 ],
 "script": {
  "day_of_week": {
    "type": "long",
    "script": "emit(doc['@timestamp'].value.withZoneSameInstant(ZoneId.of(doc['geo.timezone'])).getDayOfWeek().getValue())"
  },
  "hour_of_day": {
    "type": "long",
    "script": "emit(doc['@timestamp'].value.withZoneSameInstant(ZoneId.of(doc['geo.timezone'])).getHour())"
  },
  "office_hours": {
    "script": "if (doc['day_of_week'].value< 6 && doc['day_of_week'].value > 0) {if (doc['hour_of_day'].value> 7 && doc['hour_of_day'].value<19) {return 1;} else {return -1;} } else {return -1;}"
  }
 }
}

I receive the following error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "Expected one of [source] or [id] fields, but found none"
      }
    ],
    "type" : "parse_exception",
    "reason" : "Expected one of [source] or [id] fields, but found none"
  },
  "status" : 400
}

Solution

  • Since you have the ingestion pipeline ready, you simply need to reference it in your call to the _update_by_query endpoint, like this:

    POST my_index*/_update_by_query?pipeline=my-pipeline
                                        ^
                                        |
                                     add this