I've written a query for a report which is designed to pull some of information from a system transaction log (transaction_log), while also looking for a Master value from a second table (location_table) - this does not always exist, as sometimes the master_location or location_id fields can be NULL.
I have come up with the below query which produces my desired results, although it takes much longer to run the report since adding the 'master_location' column to the query.
I believe the performance of the query/report can probably be improved by ditching the subquery within the SELECT statement and using a join instead, but I am rusty with SQL at the moment and can't get it working.
Can anyone please suggest improvements to this query to improve the performance?
SELECT
transaction_type,
description,
id_num,
product_id,
quantity,
location_id,
CASE WHEN transaction_log.location_id IS NOT NULL
THEN ( SELECT TOP 1
master_location_id
FROM location_table t1
WHERE t1.location_id = transaction_log.source_location_id
AND master_location_id IS NOT NULL
)
ELSE NULL
END as master_location,
dummy_value,
employee_id,
FROM
transaction_log WITH (NOLOCK)
I tried a join as below, which is returning over 2000 rows too many (compared to the above query). Should I maybe be filtering down the results further in a WHERE clause?
SELECT
transaction_type,
description,
id_num,
product_id,
quantity,
location_id,
master_location_id,
dummy_value,
employee_id,
FROM
transaction_log t1 WITH (NOLOCK)
INNER JOIN ( SELECT location_id,
master_location_id
FROM location_table) t2
ON t1.location_id = t2.location_id
Situation
transaction_log.location_id
can sometimes be NULL
, so you want to make sure you get transaction log entries that have NULL
locations. That means you need an outer join.location_table.master_location_id
will always be the same as each other or NULL
, so you want to narrow down this list so there is only one row per location_id
, master_location_id
combination.master_location_id
can sometimes be NULL
, so, when narrowing down the list, you want to pull away the NULL
values. All of them can be pulled out, as argued in the note below.One Solution
SELECT
T_LOG.transaction_type,
T_LOG.description,
T_LOG.id_num,
T_LOG.product_id,
T_LOG.quantity,
T_LOG.location_id,
U_LOC.master_location_id,
T_LOG.dummy_value,
T_LOG.employee_id,
FROM
transaction_log T_LOG WITH (NOLOCK)
LEFT OUTER JOIN (
SELECT T_LOC.location_id,
T_LOC.master_location_id
FROM location_table T_LOC
WHERE T_LOC.master_location_id IS NOT NULL
GROUP BY T_LOC.location_id,
T_LOC.master_location_id
) U_LOC
ON T_LOG.location_id = U_LOC.location_id
;
Note: Consider the following table values for location_table
:
+-------------+--------------------+
| location_id | master_location_id |
+-------------+--------------------+
| 10 | 3 |
| 10 | 3 |
| 20 | NULL |
| 30 | 7 |
| 30 | NULL |
+-------------+--------------------+
Removing all NULL
master_location_id
rows and grouping by both columns produces this:
+-------------+--------------------+
| location_id | master_location_id |
+-------------+--------------------+
| 10 | 3 |
| 30 | 7 |
+-------------+--------------------+
There is an apparent loss of the row where location_id
is 20. However, if the value were there, master_location_id
would be NULL
. Since it is not there, the LEFT OUTER JOIN
will return a value of NULL
for it anyway. So there is no issue.
There would only be an issue if you wanted to interpret a missing row in location_table
differently from the way you would a NULL
value in location_table.master_location_id
. That is outside the scope of this question.