My manager uses a query like this from massive views in Teradata. It takes 10 minutes to run.
select c1, c2, sum(c3), sum(c4) etc.
from (select * from v1 where [some condition]) v1
left join (select * from v2 where [some condition]) v2
on v1.some_id = v2.some_id
group by 1, 2;
The views v1 and v2 are very large - like a billion rows and thousands of columns. But we're only selecting a handful of columns. So is it inefficient to do a "select *"?
Does it help the query if I specify only the columns we need, instead of "select *"? I know it can't hurt so I will do it anyway, I just want to confirm the Teradata behavior and logic.
It is probably efficient. I can more concisely be written as:
select c1, c2, sum(c3), sum(c4) etc.
from v1 left join
v2
on v1.some_id = v2.some_id and <conditions on v2>
where <conditions on v1>
group by 1, 2;
Teradata has a smart optimizer so the subqueries shouldn't affect the query plan. That said, they are confusing to people ("What are these subqueries doing?").
Sometimes, such subqueries are left in during different increments of the query. For instance, there might have been an attempt to calculate something in the subquery (say a row number). In the end, that was not needed, but the subquery remained.
Also, the performance is probably drive by the views and how much data is needed for the aggregation.