Search code examples
elasticsearchelasticsearch-6

doc with multi-field of boolean type fails during creation


In v5.5, we had the following mapping which was working fine

PUT multiple_datatypes
{
  "mappings": {
    "_doc": {
      "properties": {
        "user_data": {
          "type": "text",
          "fields": {
            "numeric": {
              "type": "double",
              "ignore_malformed": true
            },
            "date": {
              "type": "date",
              "ignore_malformed": true
            }
            "logical": {
              "type": "boolean",
             }
          }
        }
      }
    }
  }

In 6.2, the same mapping fails with the error
HTTP/1.1 400 Bad Request]\n{\"error\":{\"root_cause\":[{\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse [user_data.logical]\"}],\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse [user_data.logical]\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"Failed to parse value [auto_directorUrl] as only [true] or [false] are allowed

The input data was a string, "auto_directorURL" and it failed. The ignore_malformed flag is not available for boolean types. However, this worked in v5.5. I find that in v6.2, ES has strictly enforced boolean type values as 'true' or 'false'. but this fails in multi-fields as it does not have a ignore_malformed flag. what is the solution for this? Is this a BWC break and a bug


Solution

  • It was an announced breaking change.

    An alternative would be to use an ingest node with a convert processor to store the booleanized value of that field into another boolean field:

    PUT _ingest/pipeline/boolean-pipeline
    {
      "description": "converts the content of the field to a boolean value",
      "processors" : [
        {
          "convert" : {
            "field" : "user_data",
            "target_field" : "user_data_boolean",
            "type": "boolean",
            "on_failure" : [
              {
                "set" : {
                  "field" : "user_data_boolean",
                  "value" : false
                }
              }
            ]
          }
        }
      ]
    }
    

    Then you can index data using that pipeline

    PUT test/doc/1?pipeline=boolean-pipeline
    {
      "user_data": "true"
    }
    
    PUT test/doc/2?pipeline=boolean-pipeline
    {
      "user_data": "auto_directorURL"
    }
    

    As a result you'd get the following indexed data, which is pretty much what you'd expect:

    "hits" : [
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "user_data" : "auto_directorURL",
          "user_data_boolean" : false
        }
      },
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "user_data" : "true",
          "user_data_boolean" : true
        }
      }
    ]