Search code examples
sqldb2subquery

Best way to understand big and complex SQL queries with many subqueries


I just started in a new project, in a new company.

I was given a big and complex SQL, with about 1000 lines and MANY subqueries, joins, sums, group by, etc.

This SQL is used for report generation (it has no inserts nor updates).

The SQL has some flaws, and my first job in the company is to identify and correct these flaws so that the report shows the correct values (I know the correct values by accessing a legacy system written in Cobol...)

How can I make it easier for me to understand the query, so I can identify the flaws?

As an experienced Java programmer, I know how to refactor a complex bad written monolitic Java code into an easier to understand code with small pieces of code. But I have no clue on how to do that with SQL.

The SQL looks like this:

SELECT columns
FROM
    (SELECT columns
    FROM
        (SELECT DISTINCT columns
              FROM table000 alias000
              INNER JOIN                                          
                      table000 alias000             
               ON column000 = table000.column000

              LEFT JOIN
                 (SELECT columns
                    FROM (
                    SELECT DISTINCT columns  
                      FROM columns        
                     WHERE conditions) AS alias000
                        GROUP BY columns ) alias000
                   ON
                    conditions
             WHERE conditions
            ) AS alias000
                 LEFT JOIN
                  (SELECT
                    columns  
                    FROM many_tables                
     WHERE many_conditions 
            ) )
        ) AS alias000
     ON condition               
       LEFT JOIN (      
    SELECT columns
    FROM            
    (SELECT
      columns
    FROM                                                   
       many_tables       
     WHERE many_conditions
            ) ) ) AS alias001  
        ,
         (SELECT
           many_columns 
         FROM                                                   
            many_tables            
           WHERE many_conditions) AS alias001
            ) AS alias001
        ON condition
    LEFT JOIN 
        (SELECT                                     
         many_columns
       FROM many_tables               
       WHERE many_conditions
          ) AS alias001
        ON condition
        ,    
         (SELECT  DISTINCT columns
           FROM table001 alias001
           INNER JOIN                                          
                 table001 alias001              
           ON condition
            LEFT JOIN
            (SELECT columns 
            FROM (                                           
         SELECT  DISTINCT columns
          FROM tables        
          WHERE conditions
         ) AS alias001
        GROUP BY                                                  
             columns ) alias001
            ON
             condition
             WHERE                                                
                 conditions
         ) AS alias001
         LEFT JOIN
         (SELECT columns
            FROM tables            
            WHERE conditions
              ) AS alias001
            ON condition
            LEFT JOIN ( 
                 SELECT columns
             FROM
             (SELECT columns
              FROM tables               
              WHERE conditions ) AS alias001
                    ,
                    (SELECT
                        columns
                 FROM                                                   
                   tables
                 WHERE conditions ) AS alias001
                ) AS alias001
                ON condition
                 LEFT JOIN 
                                (SELECT                                     
                   columns
                 FROM                                                   
                   tables
                 WHERE conditions
                     ) AS alias001
                   ON condition
    WHERE 
    condition
    ) AS alias001
    order by column001

How can I make it easier for me to understand the query, so I can identify the flaws?


Solution

  • The solution was to simplify the query using COMMON TABLE EXPRESSIONS.

    This allowed me to break the big and complex SQL query into many small and easy to understand queries.

    COMMON TABLE EXPRESSIONS:

    • Can be used to break up complex queries, especially complex joins and sub-queries
    • Is a way of encapsulating a query definition.
    • Persist only until the next query is run.
    • Correct use can lead to improvements in both code quality/maintainability and speed.
    • Can be used to reference the resulting table multiple times in the same statement (eliminate duplication in SQL).
    • Can be a substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.

    Example:

    WITH cte (Column1, Column2, Column3)
    AS
    (
        SELECT Column1, Column2, Column3
        FROM SomeTable
    )
    
    SELECT * FROM cte
    

    My new SQL looks like this:

    ------------------------------------------
    --COMMON TABLE EXPRESSION 001--
    ------------------------------------------
    WITH alias001 (column001, column002) AS (
        SELECT column005, column006
        FROM table001
        WHERE condition001
        GROUP by column008
    )
    
    --------------------------------------------
    --COMMON TABLE EXPRESSION 002 --
    --------------------------------------------
    , alias002 (column009) as (
        select distinct column009 from table002
    )
    
    --------------------------------------------
    --COMMON TABLE EXPRESSION 003 --
    --------------------------------------------
    , alias003 (column1, column2, column3) as (
        SELECT '1' AS column1, '1' as column2, 'name001' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '1' AS column1, '1.1' as column2, 'name002' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '1' AS column1, '1.2' as column2, 'name003' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '2' AS column1, '2' as column2, 'name004' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '2' AS column1, '2.1' as column2, 'name005' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '2' AS column1, '2.2' as column2, 'name006' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '3' AS column1, '3' as column2, 'name007' AS column3 FROM SYSIBM.SYSDUMMY1
        UNION ALL
        SELECT '3' AS column1, '3.1' as column2, 'name008' AS column3 FROM SYSIBM.SYSDUMMY1
    )
    --------------------------------------------
    --COMMON TABLE EXPRESSION 004 --
    --------------------------------------------
    , alias004 (column1) as (
        select distinct column1 from table003
    )
    
    ------------------------------------------------------
    --COMMON TABLE EXPRESSION 005 --
    ------------------------------------------------------
    , alias005 (column1, column2) as (
        select column1, column2 from alias002, alias004
    )
    
    ------------------------------------------------------
    --COMMON TABLE EXPRESSION 006 --
    ------------------------------------------------------
    , alias006 (column1, column2, column3, column4) as (
        SELECT column1, column2, column3, sum(column0) as column4
        FROM table004
        LEFT JOIN table005 ON column01 = column02
        group by column1, column2, column3
    )
    
    ------------------------------------------------------
    --COMMON TABLE EXPRESSION 007 --
    ------------------------------------------------------
    , alias007 (column1, column2, column3, column4) as (
        SELECT column1, column2, column3, sum(column0) as column4
        FROM table006
        LEFT JOIN table007 ON column01 = column02
        group by column1, column2, column3
    )
    
    ------------------------------------------------------
    --COMMON TABLE EXPRESSION 008 --
    ------------------------------------------------------
    , alias008 (column1, column2, column3, column4) as (
        select column1, column2, column3, column4 from alias007 where column5 = 123
    )
    
    ----------------------------------------------------------
    --COMMON TABLE EXPRESSION 009 --
    ----------------------------------------------------------
    , alias009 (column1, column2, column3, column4) as (
        select column1, column2, 
        CASE WHEN column3 IS NOT NULL THEN column3 ELSE 0 END as column3, 
        CASE WHEN column4 IS NOT NULL THEN column4 ELSE 0 END as column4
        from table007
    )
    
    ----------------------------------------------------------
    --COMMON TABLE EXPRESSION 010 --
    ----------------------------------------------------------
    , alias010 (column1, column2, column3) as (
        select column1, sum(column4), sum(column5) 
        from alias009 
        where column6 < 2005 
        group by column1
    )
    
    --------------------------------------------
    --             MAIN QUERY            --
    --------------------------------------------
    
    select j.column1, n.column2, column3, column4, column5, column6, 
    column3 + column5 AS column7,
    column4 + column6 AS column8
    from alias010 j
    left join alias006 m ON (m.column1 = j.column1)
    left join alias008 n ON (n.column1 = j.column1)