sql teradata database-performance query-performance sqlperformance

Teradata SQL Tuning : What was the purpose of the below code

I tuned a query that was badly skewed written by a Teradata Co. Consultant few years back. The same code was a perpetually high CPU report and it has gotten worse

    SELECT 
    c.child  ,
    a.username ,
    CAST( SUM((((a.AmpCPUTime(DEC(18,3)))+
         ZEROIFNULL(a.ParserCPUTime)) )) AS DECIMAL(18,3)) 
   FROM pdcrinfo.dbqlogtbl a
          LEFT OUTER JOIN (
    SELECT queryid,logdate,
        MIN (objectdatabasename) AS                  objectdatabasename
        FROM pdcrinfo.dbqlobjtbl_hst
        GROUP BY 1,2 ) b                  ON a.queryid=b.queryid 
          JOIN dbc.children c   ON b.objectdatabasename=c.child
    WHERE c.parent ='FINDB'
AND  a.logdate BETWEEN                    '2015-12-01'  AND '2015-12-31'
        and    b.logdate BETWEEN                   '2015-12-01'  AND '2015-12-31'
    GROUP BY 1,
        2,
        3
    ORDER BY 1,
        2,
        3;

I already rewrote the query joining log & obj tables which have the same PI and then doing an exists on the dbc.child table and it runs fabulously - same o/p. But I thought I got lucky just because FINDB does not have any children that are view databases . My question : I am trying to understand what is the purpose of MIN (objectdatabasename) Most of our table database names precede view database names ( which are of the form findb_vw etc ) and so I think he may have tried to eliminate view databases ? The other thing : Why LOJ ( I changed to IJ ) because you want a value for Objectdatabasename . I think LOJ does not apply here

I am not sure so throwing the question open on the stage. So just to clarify - I am not looking for tuning tips. I wanted other perspectives on the MIN ( Objectdatabasename ) code .

Solution

You're right, the Left Join is useless (but the optimizer will change it to an Inner Join anyway, so it's just confusing).

The MIN (objectdatabasename) was probably used to avoid multiple rows for the same queryid resulting in duplicate rows (and maybe to remove the view dbs).

But IMHO the main reason for bad performance is a missing join condition between the DBQL tables. The tables in pdcrinfo should be partitioned by LogDate and you need to add AND a.LogDate=b.LogDate to the existing ON a.queryid=b.queryid to get a fast join (PI + partitioning), otherwise the optimizer must do some kind of preparation or a more expensive sliding window join.