Search code examples
sqlsql-servercode-organization

Is there a better way to organize and make this SQL query more readable?


I have inherited a piece of SQL that I am working on for a reporting system. The system is centered around Purchase Orders, and when they are Created, Transmitted, Acknowledged, and Confirmed. These states are a progressive state. The piece of SQL that I am reviewing is shown below and is from the WHERE clause:

OR (CreateJob.endtime is NULL and DATEDIFF(hour, buy_date, getdate()) > Vendor.expected_create_hours)
OR (TransmitJob.endtime is NULL and DATEDIFF(hour, CreateJob.endtime, getdate()) > Vendor.expected_transmit_hours)
OR (AcknowledgeJob.endtime is NULL and DATEDIFF(hour, TransmitJob.endtime, getdate()) > Vendor.expected_acknowledge_hours)
OR (ConfirmJob.endtime is NULL and DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) > Vendor.expected_confirm_hours)

What I have found is that it is possible for a CreateJob to not have an end time because the job failed. However a job can execute more then one Purchase Order, so there are times when a given PO actually was created, but the job does not receive an end time because a sibling later failed. This creates a scenario where the states still progress for a PO, but it still shows up on this problem report because the CreateJob.endtime is NULL.

So first there are some obvious bugs with the system and architecture above, but that is a separate issue from what I am tackling at the moment.

It seems that I could fix the report though, by basically adding the progressive checks as AND statements. So the first check for the CreateJob changes too:

(     (CreateJob.endtime is NULL and DATEDIFF(hour, buy_date, getdate()) > Vendor.expected_create_hours) 
  AND (TransmitJob.endtime is NULL and DATEDIFF(hour, CreateJob.endtime, getdate()) > Vendor.expected_transmit_hours) 
  AND (AcknowledgeJob.endtime is NULL and DATEDIFF(hour, TransmitJob.endtime, getdate()) > Vendor.expected_acknowledge_hours) 
  AND (ConfirmJob.endtime is NULL and DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) > Vendor.expected_confirm_hours))

Which is ugly and begins to obfuscate things into oblivion. Is there some way I can do the equivalent of the following in SQL? Basically some kind of macro or #define system?

CreateFailed = (CreateJob.endtime is NULL and DATEDIFF(hour, buy_date, getdate()) > Vendor.expected_create_hours)
TransmitFailed = (TransmitJob.endtime is NULL and DATEDIFF(hour, CreateJob.endtime, getdate()) > Vendor.expected_transmit_hours)
AcknowledgeFailed = (AcknowledgeJob.endtime is NULL and DATEDIFF(hour, TransmitJob.endtime, getdate()) > Vendor.expected_acknowledge_hours)
ConfirmFailed = (ConfirmJob.endtime is NULL and DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) > Vendor.expected_confirm_hours)


OR (CreateFailed AND TransmitFailed AND AcknowledgeFailed AND ConfirmFailed)
OR (TransmitFailed AND AcknowledgeFailed AND ConfirmFailed)
OR (AcknowledgeFailed AND ConfirmFailed)
OR (ConfirmFailed)

If there is a common term or name for what I am trying to do maybe someone could add it to the title?


Solution

  • [is] there is a common term or name for what I am trying to do...?

    Abstraction?

    You could hide each of your CreateFailed = ..., TransmitFailed = ..., etc logic into its own VIEW or CTE, then your main query could merely see if a Job exists in one of these tables e.g. (pseudo SQL):

    SELECT Jobs.job_ID, ...
      FROM Jobs
     WHERE EXISTS (
                   SELECT * 
                     FROM CreatedFailedJobs AS T1
                    WHERE Jobs.job_ID = T1.job_ID
                   )
           OR EXISTS (
                      SELECT * 
                        FROM TransmitFailedJobs AS T1
                       WHERE Jobs.job_ID = T1.job_ID
                      )
           OR EXISTS (
                      SELECT * 
                        FROM AcknowledgeFailedJobs AS T1
                       WHERE Jobs.job_ID = T1.job_ID
                      )
           OR EXISTS (
                      SELECT * 
                        FROM ConfirmFailedJobs AS T1
                       WHERE Jobs.job_ID = T1.job_ID
                      );
    

    Here's the CTE idea expanded (pseudo SQL):

    WITH CreateFailedJobs (job_ID, ...)
    AS
    (
     SELECT Jobs.job_ID, ...
       FROM ...
      WHERE CreateJob.endtime is NULL 
            AND DATEDIFF(hour, .SomeTable.buy_date, getdate()) 
                   > Vendor.expected_create_hours
    ), 
    TransmitFailedJobs (job_ID, ...)
    AS
    (
     SELECT Jobs.job_ID, ...
       FROM ...
      WHERE TransmitJob.endtime is NULL 
            AND DATEDIFF(hour, CreateJob.endtime, getdate()) 
                   > Vendor.expected_transmit_hours
    ), 
    AcknowledgeFailedJobs (job_ID, ...)
    AS
    (
     SELECT Jobs.job_ID, ...
       FROM ...
      WHERE AcknowledgeJob.endtime is NULL 
            AND DATEDIFF(hour, TransmitJob.endtime, getdate()) 
                   > Vendor.expected_acknowledge_hours
    ), 
    ConfirmFailedJobs (job_ID, ...)
    AS
    (
     SELECT Jobs.job_ID, ...
       FROM ...
      WHERE ConfirmJob.endtime is NULL 
            AND DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) 
                   > Vendor.expected_confirm_hours)
    )
    SELECT Jobs.job_ID, ...
      FROM Jobs
     WHERE EXISTS (
                   SELECT * 
                     FROM CreatedFailedJobs AS T1
                    WHERE Jobs.job_ID = T1.job_ID
                   )
           OR EXISTS (
                      SELECT * 
                        FROM TransmitFailedJobs AS T1
                       WHERE Jobs.job_ID = T1.job_ID
                      )
           OR EXISTS (
                      SELECT * 
                        FROM AcknowledgeFailedJobs AS T1
                       WHERE Jobs.job_ID = T1.job_ID
                      )
           OR EXISTS (
                      SELECT * 
                        FROM ConfirmFailedJobs AS T1
                       WHERE Jobs.job_ID = T1.job_ID
                      );
    

    I'm not sure that this would be good for performance but readability is the aim and anyhow this report can be run offline.