Search code examples
amazon-web-serviceselasticsearchhashmapkibana

Remove Fields from ES


I am getting the following error on ES:

[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] in index [<index_name>] has been exceeded]]

I don't want to increase the field size, since it can cause memory explosion. After going through a lot of solutions on stack overflow, I found that we need to create a back up index, which I created like this:

PUT /<dest_index>

Now, I need to copy data from existing index and to the new one created above, while removing the unwanted field.

So, i tried this:

Created a pipeline for removing the field:

PUT _ingest/pipeline/removePropertyMap
{
  "description": "Removes the 'propertyMap' field", 
  "processors": [
    {
      "remove": {
        "field" : "propertyMap"
      }
    }
  ]
}

And, I am copying data like this:

POST _reindex
{
  "source": {
    "index": "<source_index>"
  },
  "dest": {
    "index": "<dest_index>",
    "pipeline": "removePropertyMap"
  }
}

After this, i still see propertyMap as a field in the mapping of the new index.

I am checking the mapping through:

GET <dest_index>/_mapping

Now the field which I want to delete looks like this:

"project": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
    }
},
"properties": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
    }
},
"propertyMap": {
    "properties": {
        "90001": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        },
        {
            ...
        }
    }
}        

Here is the same document structure:

{
    "<index>": {
        "mappings": {
            "_doc": {
                "properties": {
                    "propertyMap": {
                        "properties": {
                            "field1": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "field2": {
                                "properties": {
                                    "anotherField": {
                                        "type": "text",
                                        "fields": {
                                            "keyword": {
                                                "type": "keyword",
                                                "ignore_above": 256
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Inside properties there are a huge number of other fields as well, same as field 1 and field 2.

What am i doing wrong in this?


Solution

  • I'm not too familiar with pipelines, but just as an alternative solution, have you considered the opposite approach (specifying which fields you want to keep)?

    POST _reindex
    {
      "source": {
        "index": "<source_index>",
        "_source": ["keep_field_1", "keep_field_2"]
      },
      "dest": {
        "index": "<dest_index>"
      }
    }
    

    The list is probably a lot longer since you're around 1000 field limit, but you should be able to get it relatively easily from your mapping.