I have a use case where an API i'm calling to retrieve data to put into elasticsearch is returning nulls.
I need to write an ingest pipeline that uses processors to remove all null fields before writing it into elasticsearch. Processors may or may not use painless scripting.
Here is a sample payload that i currently get from the API
{
"master_desc": "TESTING PART",
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1",
"external_key": null,
"umid": null
}
}
The pipeline should ideally insert the document as -
{
"master_desc": "TESTING PART",
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1"
}
}
Note, the fields are dynamic so i can't write a processor that checks for nulls against a defined set of fields.
Thanks!
Null fields are not indexed nor are searchable.I have written below pipeline to remove such fields. Please test it before use on all of your scenarios. After posting documents using this pipeline, you won't be able to search null fields using "exists"
Pipeline:
PUT _ingest/pipeline/remove_null_fields
{
"description": "Remove any null field",
"processors": [
{
"script": {
"source": """
// return list of field with null values
def loopAllFields(def x){
def ret=[];
if(x instanceof Map){
for (entry in x.entrySet()) {
if (entry.getKey().indexOf("_")==0) {
continue;
}
def val=entry.getValue();
if( val instanceof HashMap ||
val instanceof Map ||
val instanceof ArrayList)
{
def list=[];
if(val instanceof ArrayList)
{
def index=0;
// Call for each object in arraylist
for(v in val)
{
list=loopAllFields(v);
for(item in list)
{
ret.add(entry.getKey()+"["+index+"]."+ item);
}
index++;
}
}
else
{
list =loopAllFields(val);
}
if(list.size()==val.size())
{
ret.add(entry.getKey());
}
else{
for(item in list)
{
ret.add(entry.getKey()+"."+ item);
}
}
}
if(val==null)
{
ret.add(entry.getKey());
}
}
}
return ret;
}
/* remove fields from source, recursively deletes fields which part of other fields */
def removeField(def ctx, def fieldname)
{
def pos=fieldname.indexOf(".");
if(pos>0)
{
def str=fieldname.substring(0,pos);
if(str.indexOf('[')>0 && str.indexOf(']')>0)
{
def s=str.substring(0,str.indexOf('['));
def i=str.substring(str.indexOf('[')+1,str.length()-1);
removeField(ctx[s][Integer.parseInt(i)],fieldname.substring(pos+1,fieldname.length()));
}
else
{
if(ctx[str] instanceof Map)
{
removeField(ctx[str],fieldname.substring(pos+1,fieldname.length()));
}
}
}else{
ctx.remove(fieldname);
}
return ctx;
}
def list=[];
list=loopAllFields(ctx);
for(item in list)
{
removeField(ctx,item);
}
"""
}
}
]
}
Post Document:
POST index8/_doc?pipeline=remove_null_fields
{
"master_desc": "TESTING PART",
"ddd":null,
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1",
"external_key": null,
"umid": null
}
}
Result:
"hits" : [
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "06XAyXEBAWHHnYGOSa_M",
"_score" : 1.0,
"_source" : {
"date_added" : "2019-10-24T09:30:03",
"master_no" : {
"master_no" : 18460110,
"barcode" : "NLSKYTEST1-1"
},
"master_desc" : "TESTING PART"
}
}
]