I am trying to print the top 5 sports according to the duration. I ran the below query:
with t1
AS
(
select cast(duration as int),race,lap,sport
from db.table1
where exchange_code in ("tennis", "golf", "football")
and table1_date = 20201010
and duration is not null
and race is not null
and lap is not null
and sport is not null
)
select sum(duration_int) ,race,lap,sport
from t1
group by race,lap,sport
order by sum(duration_int) desc
limit 5;
And the results are these:
sum(duration_int) race lap sport
[null] first second golf
408439363026
65886284 fourth third football
33687102 fifth first american-football
22642805 tenth fifth english-football
As you can see I have null values after the condition IS NOT NULL
.
As the Impala documentation explains, CAST()
does not return an error if it cannot cast the value, it returns NULL
:
If the expression value is of a type that cannot be converted to the target type, the result is NULL.
Hence, the condition duration is not null
is not sufficient. Instead:
cast(duration as int) is not null
I am not sure why race
, lap
, and sport
would have NULL
values. I suspect the values may be the string '[null]'
.