Search code examples
elasticsearchupgradeelasticsearch-java-api

Elasticsearch 2.x index mapping _id


I ran ElasticSearch 1.x (happily) for over a year. Now it's time for some upgrading - to 2.1.x. The nodes should be turned off and then (one-by-one) on again. Seems easy enough.
But then I ran into troubles. The major problem is the field _uid, which I created myself so that I knew the exact location of a document from a random other one (by hashing a value). This way I knew that only that the exact one will be returned. During upgrade I got

MapperParsingException[Field [_uid] is a metadata field and cannot be added inside a document. Use the index API request parameters.]

But when I try to map my former _uid to _id (which should also be good enough) I get something similar.

The reason why I used the _uid param is because the lookup time is a lot lower than a termsQuery (or the like).
How can I still use the _uid or _id field in each document for the fast (and exact) lookup of certain exact documents? Note that I have to call thousands exact ones at the time, so I need an ID like query. Also it may occur the _uid or _id of the document does not exist (in that case I want, like now, a 'false-like' result)

Note: The upgrade from 1.x to 2.x is pretty big (Filters gone, no dots in names, no default access to _xxx)

Update (no avail):
Updating the mapping of _uid or _id using:

final XContentBuilder mappingBuilder = XContentFactory.jsonBuilder().startObject().startObject(type).startObject("_id").field("enabled", "true").field("default", "xxxx").endObject()
            .endObject().endObject();
 CLIENT.admin().indices().prepareCreate(index).addMapping(type, mappingBuilder)
                .setSettings(Settings.settingsBuilder().put("number_of_shards", nShards).put("number_of_replicas", nReplicas)).execute().actionGet();

results in:

MapperParsingException[Failed to parse mapping [XXXX]: _id is not configurable]; nested: MapperParsingException[_id is not configurable];

Update: Changed name into _id instead of _uid since the latter is build out of _type#_id. So then I'd need to be able to write to _id.


Solution

  • Since there appears to be no way around setting the _uid and _id I'll post my solution. I mapped all document which had a _uid to uid (for internal referencing). At some point it came to me, you can set the relevant id

    To bulk insert document with id you can:

    final BulkRequestBuilder builder = client.prepareBulk();
    for (final Doc doc : docs) {
        builder.add(client.prepareIndex(index, type, doc.getId()).setSource(doc.toJson()));
    }
    final BulkResponse bulkResponse = builder.execute().actionGet();
    

    Notice the third argument, this one may be null (or be a two valued argument, then the id will be generated by ES).
    To then get some documents by id you can:

    final List<String> uids = getUidsFromSomeMethod(); // ids for documents to get
    final MultiGetRequestBuilder builder = CLIENT.prepareMultiGet();
    builder.add(index_name, type, uids);
    final MultiGetResponse multiResponse = builder.execute().actionGet();
    // in this case I simply want to know whether the doc exists
    if (only_want_to_know_whether_it_exists){
        for (final MultiGetItemResponse response : multiResponse.getResponses()) {
            final boolean exists = response.getResponse().isExists();
            exist.add(exists);
        }
    } else {
        // retrieve the doc as json
        final String string = builder.getSourceAsString();
        // handle JSON
    }
    

    If you only want 1:

    client.prepareGet().setIndex(index).setType(type).setId(id);
    

    Doing - the single update - using curl is mapping-id-field (note: exact copy):

    # Example documents
    PUT my_index/my_type/1
    {
      "text": "Document with ID 1"
    }
    
    PUT my_index/my_type/2
    {
      "text": "Document with ID 2"
    }
    
    GET my_index/_search
    {
      "query": {
        "terms": {
          "_id": [ "1", "2" ] 
        }
      },
      "script_fields": {
        "UID": {
          "script": "doc['_id']" 
        }
      }
    }