I have a table order_status_history in which is stored one row for every order status change and I want to get the last entry for a determined order status.
Tables involved (simplified):
orders
| id |
| 1 |
order_products
| id | order_id | productid | quantity|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 1 | 2 |
| 3 | 1 | 1 | 2 |
| 4 | 1 | 1 | 10 |
order_status_history
| id | order_id | order_status_id | updatedat |
| 1 | 1 | 1 | 2017-05-18 18:45:50 |
| 2 | 1 | 2 | 2017-05-19 18:45:50 |
| 3 | 1 | 3 | 2017-05-20 18:45:50 |
| 4 | 1 | 2 | 2017-05-21 18:45:50 |
| 5 | 1 | 3 | 2017-05-22 18:45:50 |
What I exactly need is:
If the last order status is "3" (as "4" means cancelled or anyway not more available), for each order id get only the last entry (based on updatedat column, last means most recent)
This is the query I tried, but in this case returns 2 rows for the same order id and status 3:
SELECT o.id,
osh.updatedat,
op.quantity
FROM orders_products op
LEFT JOIN order_status_history osh
ON osh.order_id = op.orderid
LEFT JOIN orders o
ON o.id = op.orderid
WHERE op.productid = 1
AND (SELECT osh.order_status_id
FROM order_status_history osh
WHERE osh.order_id = o.id
ORDER BY osh.updatedat DESC
LIMIT 1) = 3
I believe that without any entry in the orders
table, there cannot be any entry in the orders_products
and order_status_history
table. So, I would change the order of table join for ease of understand ability. Moreover, there is no need of LEFT JOIN
, as we are trying to fetch rows for a specific product (hence, rows should exist in the products table), and at a specific order_status (hence, rows should exist in the status history table). So I will change all the LEFT JOIN
in the query to INNER JOIN
.
Now, to get the last updatedat
row corresponding to a particular status, we will have to use a Derived Table. In this subquery, we will fetch the latest updatedat
value for every order, when status is 3.
Eventually, we will join this subquery result-set to the main tables appropriately, to get data corresponding to latest updatedat
row value.
Also, it is generally a good practice to shift your WHERE
conditions to the join ON
clause for ease of understand-ability. Also, in future when you change from INNER JOIN
to LEFT JOIN
etc, you can easily change without worrying about unnecessary filtering happening due to WHERE
.
Try the following Query: View on DB Fiddle
SELECT
o.id,
op.quantity,
osh.updatedat
FROM
orders AS o
JOIN order_products AS op
ON op.order_id = o.id AND
op.productid = 1
JOIN order_status_history AS osh
ON osh.order_id = o.id
JOIN (SELECT order_id,
MAX(updatedat) AS max_updated_at
FROM order_status_history
WHERE order_status_id = 3
GROUP BY order_id
) AS dt
ON dt.order_id = o.id AND
dt.max_updated_at = osh.updatedat;
Result:
| id | quantity | updatedat |
| --- | -------- | ------------------- |
| 1 | 1 | 2017-05-22 18:45:50 |
| 1 | 2 | 2017-05-22 18:45:50 |
| 1 | 2 | 2017-05-22 18:45:50 |
| 1 | 10 | 2017-05-22 18:45:50 |
Other than updatedat
values, if you don't need to get any other column(s) from the order_status_history
table, you can optimize the query further by getting rid of the Join with the order_status_history
table
Query #2
SELECT
o.id,
op.quantity,
dt.max_updated_at AS updatedat
FROM
orders AS o
JOIN order_products AS op
ON op.order_id = o.id AND
op.productid = 1
JOIN (SELECT order_id,
MAX(updatedat) AS max_updated_at
FROM order_status_history
WHERE order_status_id = 3
GROUP BY order_id
) AS dt
ON dt.order_id = o.id;
Result:
| id | quantity | updatedat |
| --- | -------- | ------------------- |
| 1 | 1 | 2017-05-22 18:45:50 |
| 1 | 2 | 2017-05-22 18:45:50 |
| 1 | 2 | 2017-05-22 18:45:50 |
| 1 | 10 | 2017-05-22 18:45:50 |