Search code examples
elasticsearchprocessor

Run Elasticsearch processor on all the fields of a document


I am trying to trim and lowercase all the values of the document that is getting indexed into Elasticsearch

The processors available has the field key is mandatory. This means one can use a processor on only one field

Is there a way to run a processor on all the fields of a document?


Solution

  • There sure is. Use a script processor but beware of reserved keys like _type, _id etc:

    PUT _ingest/pipeline/my_string_trimmer
    {
      "description": "Trims and lowercases all string values",
      "processors": [
        {
          "script": {
            "source": """
              def forbidden_keys = [
                '_type',
                '_id',
                '_version_type',
                '_index',
                '_version'
              ];
              
              def corrected_source = [:];
              
              for (pair in ctx.entrySet()) {
                def key = pair.getKey();
                if (forbidden_keys.contains(key)) {
                  continue;
                }
                def value = pair.getValue();
                
                if (value instanceof String) {
                  corrected_source[key] = value.trim().toLowerCase();
                } else {
                  corrected_source[key] = value;
                }
              }
              
              // overwrite the original
              ctx.putAll(corrected_source);
            """
          }
        }
      ]
    }
    

    Test with a sample doc:

    POST my-index/_doc?pipeline=my_string_trimmer
    {
      "abc": " DEF ",
      "def": 123,
      "xyz": false
    }