Search code examples
elasticsearchelasticsearch-6

ElasticSearch: How to move a field to a different level with existing data?


Say I have:

PUT /test/_doc/1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch",
    "data": {
        "modified_date": "2018-11-15T14:12:12",
        "password": "abcpassword"
    }
}

Then I get the following mapping:

GET /test/_mapping/_doc
{
    "test": {
        "mappings": {
            "_doc": {
                "properties": {
                    "data": {
                        "properties": {
                            "modfied_date": {
                                "type": "date"
                            },
                            "password": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            }
                        }
                    },
                    "message": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "post_date": {
                        "type": "date"
                    },
                    "user": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            }
        }
    }
}

How can I reindex the mapping to bring modified_date to the same level as user and not lose any data?

{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch",
    "modified_date": "2018-11-15T14:12:12"
    "data": {
        "password": "abcpassword"
    }
}

Solution

  • I'd suggest using Ingest Node and Pipelines. You can read about them in the links added respectively.

    Basically what I will do is, construct a pipeline and mention it during indexing or reindexing process so that your document would go through the pre-processing as defined in the pipeline before document is actually stored in the destination index.

    I've created below pipeline for your use case. What it does is, adds a new field modified_date with value as required and removed field data.modified_date. If any fields are not mentioned in it, it would not be modified and would be ingested in destination index as is.

    Create/Add Pipeline

    PUT _ingest/pipeline/mydatepipeline
    {
      "description" : "modified date pipeline",
      "processors" : [
        {
          "set" : {
            "field": "modified_date",
            "value": "{{data.modified_date}}"
          }
        },
        {
          "remove": {
            "field": "data.modified_date"
          }
        }
      ]
    }
    

    Once above pipeline is created, make use of it to perform reindexing.

    Usage 1: During Reindexing to New Index

    POST _reindex
    {
      "source": {
        "index": "test"
      },
      "dest": {
        "index": "test_dest",
        "pipeline": "mydatepipeline"
      }
    }
    

    The documents would be transformed as what you expect it to be and would be indexed in test_dest index. Note that you need to explicitly create the test_dest with the mapping details as per your requirement.

    Usage 2: Using pipeline during bulk operations before indexing

    You can use it during bulk operation as follows:

    POST _bulk?pipeline=mydatepipeline
    

    Usage 3: Using the pipeline on individual docs during indexing

    PUT test/_doc/1?pipeline=mydatepipeline
    {
      "user" : "kimchy",
      "post_date" : "2009-11-15T14:12:12",
      "message" : "trying out Elasticsearch",
      "data": {
          "modified_date": "2018-11-15T14:12:12",
          "password": "abcpassword"
      }
    }
    

    For both Usage 2 and 3, you need to ensure your mapping is created accordingly.

    Hope this helps!