Search code examples
elasticsearchelastic-stackelasticsearch-5elasticsearch-dslelasticsearch-opendistro

why script processor works in reindex api and not working on pipeline


i create idices based on projectId like so:

//By calling reindex API directly,it works fine

POST _reindex?wait_for_completion=false
{
  "conflicts": "proceed",
  "source": {
    "index": "xxxxx-rlk-test1-2021-07-22"
  },
  "dest": {
    "index": "xxxxxx",
    "op_type": "create"
  },
  "script": {
    "lang": "painless",
    "source": """
          if (ctx._source.kubernetes != null){
            if (ctx._source.kubernetes.namespace_labels['field_cattle_io/projectId'] != null){
              ctx._index = 'xxxxxx-rlk-'+ (ctx._source.kubernetes.namespace_labels['field_cattle_io/projectId']) + '' + (ctx._index.substring('xxxxxx-rlk-test-'.length(), ctx._index.length()))
            }else {
              ctx._index = 'xxxxxx-rlk-'+ (ctx._source.kubernetes.namespace_labels['field_cattle_io/projectId']) +'-noproject'
            }
          }
      """
  }
}

But when i would like to use reindex with pipeline like so:

PUT _ingest/pipeline/group-by-projectid-pipeline
{
  "description": "this pipeline split indices by pipeline",
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": """
          if (ctx.kubernetes != null){
            if (ctx.kubernetes.namespace_labels['field_cattle_io/projectId'] != null){
              ctx._index = 'xxxxxx-rlk-'+ (ctx.kubernetes.namespace_labels['field_cattle_io/projectId']) +'' + (ctx._index.substring('xxxxxx-rlk-test-'.length(), ctx._index.length()))
            }else {
              ctx._index = 'xxxxxx-rlk-'+ (ctx.kubernetes.namespace_labels['field_cattle_io/projectId']) +'-noproject'
            }
          }
      """
      }
    }
  ]
}

and :

POST _reindex
{
  "conflicts": "proceed",
  "source": {
    "index": "xxxxxx-rlk-test1-2021-07-22"
  },
  "dest": {
    "index": "xxxxxx",
    "pipeline": "group-by-projectid-pipeline",
    "op_type": "create"
  }
}

then elasticsearch says (about (ctx._index.substring('xxxxxx-rlk-test-'.length(), ctx._index.length()))):

"type" : "string_index_out_of_bounds_exception", "reason" : "begin 16, end 6, length 6"

Thank you in advance for your help!


Solution

  • This is because the script do not execute at the same time in both situations.

    During the reindex call without pipeline, the script is executing before the document lands in the destination index, hence ctx._index is the name of the source index, i.e. xxxxxx-rlk-test1-2021-07-22, so your substring call works.

    During a reindex call with pipeline, the script processor runs at the time the document is about to land in the destination index, hence ctx._index is the name of the destination index, i.e. xxxxxx.

    This is the reason by '...'.substring(16, 6) doesn't work. So you should proceed differently in the second case.

    The easy way out of this (if you want to keep the same logic) is to use a dummy destination index that has the same length as the source one that you're supposed to modify anyway:

    POST _reindex
    {
      "conflicts": "proceed",
      "source": {
        "index": "xxxxxx-rlk-test1-2021-07-22"
      },
      "dest": {
        "index": "xxxxxx-rlk-xxxxx-2021-07-22",        <--- change this
        "pipeline": "group-by-projectid-pipeline",
        "op_type": "create"
      }
    }