with core as (
select
t1.a,
t1.b,
t2.c
from
test_1 t1
join
test_2 t2 on t1.a = t2.a
where
t1.b = 'test'
)
,tmp_1 as (
select a,count(1) from core group by a
)
,tmp_2 as (
select b,count(1) from core group by b
)
select
t1.a,
t1.count,
t2.count
from
tmp_1 t1
join
tmp_2 t2 on t1.a=t2.b
My question is, does the core
CTE in above query just compute once? or twice in both the tmp_1
and tmp_2
CTEs,I could not find clues in the documentation of impala,
As of now, with present versions of Impala and Hive, core CTE in your query will be computed 2 times, as you are referring it in tmp_1 and tmp_2.
You can observe it in the EXPLAIN PLAN of the query.
Conclusion is that, you can use WITH clause for better debugging and maintainability of the complex queries. It doesn't help in increasing the performance of the Hive or Impala Query. As per the Hive JIRA site, there is no route map to include this recursive feature down the line.
Oracle and PostgreSQL supports this recursive feature.