Search code examples
sqldb2mainframe

Left Outer Join Vs Individual Queries DB2


I am using Mainframe Db2. Its a patient database. My requirement is to fetch different information from 3 tables, TABACC, TABPAY and TABINS. TABACC will always have a row for a patient but TABPAY and TABINS may or may not have a row for a patient in the system.

Which will be more efficient, a QUERY on TABACC with LEFT OUTER JOIN to TABPAY and TABINS or three different queries, one for each TABACC, TABPAY and TABINS.


Solution

  • It depends.

    If all you doing is pulling back 1 row data from three tables, then it's hard to beat COBOL's random read. There's simply less overhead. SQL isn't magic.

    But you mention writing out to a file. So lets assume that you're pull back 100s, 1,000s or even millions of rows from those files and outputting them to a new file.

    Instead of working row by row, which is COBOL's only option and all too often used when using SQL. You could work with the entire set in SQL

    insert into newtable 
      (SELECT TB1.COL1,
              TB2.COL4, 
              TB3.COL5, 
              TB4.COL6
      FROM TB1 JOIN TB2 ON TB1.KEY = TB2.KEY
      LEFT OUTER JOIN TB3 ON TB1.KEY = TB3.KEY
      LEFT OUTER JOIN TB4 ON TB1.KEY = TB4.KEY)
    

    Now the SQL solution should be much, much faster.

    The key with SQL is to think in sets. If you are doing something row by row (aka using a cursor) you're probably (but not always) doing something wrong.

    You can't simply change from COBOL's native I/O to SQL and expect better performance. It will in fact be worse.

    Lastly, consider what the output file is being used for. If you're exporting data to an outside system, then you're pretty much done. But if you're writing a work file for another COBOL program to process...well you've probably got an opportunity for more improvement. Look at the entire process, consider what's being done as a whole and how a set based SQL solution could do it.