I am trying to trim and lowercase all the values of the document that is getting indexed into Elasticsearch
The processors available has the field key is mandatory. This means one can use a processor on only one field
Is there a way to run a processor on all the fields of a document?
There sure is. Use a script processor but beware of reserved keys like _type
, _id
etc:
PUT _ingest/pipeline/my_string_trimmer
{
"description": "Trims and lowercases all string values",
"processors": [
{
"script": {
"source": """
def forbidden_keys = [
'_type',
'_id',
'_version_type',
'_index',
'_version'
];
def corrected_source = [:];
for (pair in ctx.entrySet()) {
def key = pair.getKey();
if (forbidden_keys.contains(key)) {
continue;
}
def value = pair.getValue();
if (value instanceof String) {
corrected_source[key] = value.trim().toLowerCase();
} else {
corrected_source[key] = value;
}
}
// overwrite the original
ctx.putAll(corrected_source);
"""
}
}
]
}
Test with a sample doc:
POST my-index/_doc?pipeline=my_string_trimmer
{
"abc": " DEF ",
"def": 123,
"xyz": false
}