Search code examples
elasticsearchnullelasticsearch-query

How to handle nulls in an Elasticsearch index


I have a SQL table that I am exporting to Elasticsearch.

One of the columns is a numeric field that is nullable, with nulls in some of the records.

When we try to index the table, we get this error:

One of the ETL (BigQuery -> ElasticSearch) jobs for Table : MLS has been ES Failed Chunk of 10000 from index 20000 possibly due to incompatible objects.

Failing BigQuery Table: MLS

Stack Trace of the error:

Traceback (most recent call last): File "/Users/asif/zodiacbackend/zodiacbackend/tasks.py", line 205, in

insertIntoES helpers.bulk(es, doc_generator(dataframe,table)) File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 300, in bulk for ok, item in streaming_bulk(client, actions, *args, **kwargs): File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 230, in streaming_bulk **kwargs File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk raise BulkIndexError("%i document(s) failed to index." % len(errors), errors) elasticsearch.helpers.errors.BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'mls', '_type': 'mls', '_id': 'b100qHABEFI45Lp-z3Om', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'mapper [Lot_Size_Sq_Ft] of different type, current_type [text], merged_type [long]'}, 'data': { 'Lot_Size_Sq_Ft': Decimal('13504')}}}])

How do I get the system to recognize nulls?


Solution

  • User WittyID, missed some important things like:

    1. values of null_value must be of the same data-type, of your field, so in his example, he declared integer field but defined NULL as null_values, would throw json_parse_exception and this is mentioned as important in the official link as below:

    The null_value needs to be the same data type as the field. For instance, a long field cannot have a string null_value.

    1. The null_value only influences how data is indexed, it doesn’t modify the _source document, so in your source document, whatever you passed, will be stored, not the one mentioned in null_values param and at query time also you need to use the value null_value param..

    In short, null isn't recognized in ES, hence you can define your custom values for null and then use it to index and query the null values.It's easy to explain the entire thing using the below example, which anybody can try:

    Create index

    {
      "mappings": {
        "properties": {
          "my_signed_integer": {
            "type":"integer",
            "null_value": -1 --> note we defining `null` values as `-1`.
          }
        }
      }
    }
    

    Index doc

    1. store null integer docs

      { "my_number" : null }

    If you get this doc from ES it would e returned as below:

    {
       "_index": "so-6053847",
       "_type": "_doc",
       "_id": "1",
       "_version": 1,
       "_seq_no": 0,
       "_primary_term": 1,
       "found": true,
       "_source": {
          "my_number": null. --> As explained earlier, in source its stored as `null`.
       }
    }
    
    1. Index non-negative value

      { "my_number" : 10 }

    Search query to fetch integer which had null values

    {
      "query": {
        "term": {
          "my_signed_integer": -1 -->notice same `null_value`, you need to mention
        }
      }
    }
    

    Result:

     "hits": [
             {
                "_index": "so-6053847",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                   "my_signed_integer": null --> notice it shows `null`, not `-1`
                }
             }
          ]
    

    Search query for other numbers(not null) ie in our case 10

    {
      "query": {
        "term": {
          "my_signed_integer": 10
        }
      }
    }
    

    Result

    "hits": [
             {
                "_index": "so-6053847",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                   "my_signed_integer": 10 -->source matches the indexed value for this doc
                }
             }
          ]