Search code examples
sqlimpala

Impala return null values after 'is not equal' condition


I am trying to print the top 5 sports according to the duration. I ran the below query:

with t1
AS
(
 select cast(duration as int),race,lap,sport
 from db.table1
 where exchange_code in ("tennis", "golf", "football")
 and table1_date = 20201010
 and duration is not null
 and race is not null 
 and lap is not null 
 and sport is not null
)
select sum(duration_int) ,race,lap,sport
from t1
group by race,lap,sport
order by sum(duration_int) desc
limit 5;

And the results are these:

 sum(duration_int) race   lap      sport
 [null]            first  second   golf
 408439363026         
 65886284          fourth third    football
 33687102          fifth  first    american-football    
 22642805          tenth  fifth    english-football 

As you can see I have null values after the condition IS NOT NULL.


Solution

  • As the Impala documentation explains, CAST() does not return an error if it cannot cast the value, it returns NULL:

    If the expression value is of a type that cannot be converted to the target type, the result is NULL.

    Hence, the condition duration is not null is not sufficient. Instead:

    cast(duration as int) is not null
    

    I am not sure why race, lap, and sport would have NULL values. I suspect the values may be the string '[null]'.