Search code examples
jsonelasticsearchmvel

Elasticsearch Numeric Id Generation


I am using elasticsearch to insert documents from different client applications. I have different clients doing inserting so I can't just locally stash the next id... I need to lookup the next expected id in elasticsearch. I am using an ID generation scheme that is based on integers such as done in many of the "twitter" examples on the site. My question is how best to lookup the last id? The id is stored as a string so running a sort operation does not work, such as:

curl -XGET 'http://localhost:4040/search/geolocations/geos/_search' -d '{
  "sort": [
    {
      "_id": {
        "order": "asc"
      }
    }
  ],
  "query": {
    "match_all": {}
  }
}'

For the above if you had 1,2,10,11, stored: The result would have "2" being the highest... which is maybe correct for a string but not for an integer.

I would like to stick with an integer id here in contrast to switching to a traditional string UUID.

I have been considering using: http://www.elasticsearch.org/guide/reference/query-dsl/script-filter/

to run a script that would cast the id string to an integer... but that also seems like a bad approach and unclear on how that would work with the combined JSON and MVEL syntax.

Made an attempt with:

curl -XGET 'http://localhost:4040/search/geolocations/geos/_search' -d '{
  "sort": {
    "_script": {
      "script": "doc['_id'].value",
      "type": "number",
      "order": "asc"
    }
  },
  "query": {
    "match_all": {}
  }
}'

but realize it does not parse.

Another note, I expect adding new records to be a rather infrequent operation so the performance here is not so critical. I rather go with an expensive query operation here than reinvent the wheel switching everything over to a different id scheme, i.e. non-integer based.


Solution

  • I was able to use: "from":0,"size" : 5,"query" : {"match_all" : {}}

    type queries ignoring the id to get the behavior I was after. It was unclear to me from the api that you will get the same 0-4 records back even though there is no "id" explicitly defined identifying records 0-4. In fact, I am now just using string UUIDs for testing.