Search code examples

Convert two repeated values in array into a string

I have some old documents where a field has an array of two vales repeated, something like this:

          "task" : [

I'm trying to convert this array into a string because it's the same value. I've seen the following script: Convert array with 2 equal values to single value but in my case, this problem can't be fixed through logstash because it happens just with old documents stored.

I was thinking to do something like this:

POST _ingest/pipeline/_simulate
  "pipeline": {
    "processors": [
        "script": {
          "description": "Change task field from array to first element of this one",
          "lang": "painless",
          "source": """
            if (ctx['task'][0] == ctx['task'][1]) {
                ctx['task'] = ctx['task'][0];
  "docs": [
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : ["first_task", "first_task"]

The result document is the following:

  "docs" : [
      "doc" : {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : "first_task"
        "_ingest" : {
          "timestamp" : "2022-05-11T09:08:48.150815183Z"

We can see the task field is reassigned and we have the first element of the array as a value.

Is there a way to manipulate actual data from Elasticsearch and convert all the documents with this characteristic using DSL queries?



  • You can achieve this with _update_by_query endpoint. Here is an example:

    POST tasks/_update_by_query
      "script": {
        "source": """
          if (ctx._source['task'][0] == ctx._source['task'][1]) {
              ctx._source['task'] = ctx._source['task'][0];
        "lang": "painless"
      "query": {
        "match_all": {}

    You can remove the match_all query if you want to update all documents or you can filter documents by chaning the conditions in the query.

    Keep in mind that running a script to update all documents in the index may cause some performance issues while the update process is running.