Search code examples
hadoopnullhiveemrtez

Hive 2.1 cannot doesn't evaluate properly NULL map


I'm trying to find an workaround for checking for NULL maps in Hive 2.1. When I'm checking the keys inside the map I get the correct result. Here's what I'm trying to evaluate and also the result:

hive> select request_params, request_params is null, request_params['id'], request_params['id'] is null  from dcs_log_pday_s3 where pday='2016-11-24' and hour='00' and pid=1124 limit 10;
OK
{"id":"EAED7055-7003-4A11-8494-CF1079EF24","et%3Aivs%7Cdata%3Ad":"1966"}    false   EAED7055-7003-4A11-8494-CF1079EF24  false
NULL    false   NULL    true
NULL    false   NULL    true
NULL    false   NULL    true
NULL    false   NULL    true
NULL    false   NULL    true
NULL    false   NULL    true
NULL    false   NULL    true
NULL    false   NULL    true
{"id":"97EBD924-C45C-4A93-AAEF-57595005","et%3Aibs%7Cdata%3Ad":"1966"}  false   97EBD924-C45C-4A93-AAEF-57595005    false
Time taken: 0.095 seconds, Fetched: 10 row(s)

However, I haven't the same issue on old versions of Hive (eg on Hive 1.1):

hive> select request_params, request_params is null  from dcs_log_pday_s3 where 
pday='2016-11-24' and hour='00' and pid=1124 limit 10;
Query ID = keystone_20161126000606_0a8caadd-ec27-4a92-96a9-5e7b0b95f318
...
Total MapReduce CPU Time Spent: 1 minutes 2 seconds 750 msec
OK
{"id":"3AFCDA87-BAE5-4EEA-9B47-61BA2B4D0BC8","et%3Aibs%7Cdata%3Ad":"1966"}  false
NULL    true
NULL    true
NULL    true
NULL    true
NULL    true
NULL    true
NULL    true
{"id":"8D0F1D5F-57E7-4127-94C0-C1FD4F98D6E8","et%3Aibs%7Cdata%3Ad":"1966"}  false

I can add that the first query I'm running on AWS EMR 5.2 and the second query is running on a local CDH5.4 cluster.

Can someone give me some insight?


Solution

  • I have found out a workaround for this by using size UDF. It seems like when evaluating NULLs it returns -1. (https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSize.java#L74)

    Here's a query run:

    hive> select size(null) from emr_output limit1;
    OK
    -1