Why is the below query taking forever to complete whereas if queried separately returns within seconds?
select count(*),count(distinct a), count(distinct b) from test.tablename;
However, if queried either of the below two queries it returns in seconds. e.g.
select count(*),count(distinct a) from test.tablename;
or
select count(*),count(distinct b) from test.tablename;
anything to speed up? Already ran a analyze for compute stats. hive configurations are mostly optimized.
UPDATE: apologies. missed the count in the question.
Alternative approach if you do not have too big counts (too big arrays will cause OOM). size(collect_set())
will give you the distinct count.
select count(*), size(collect_set(a)), size(collect_set(b)) from test.tablename;