Search code examples
mysqldistinct-values

MySQL duplicate result with join


I have a table order_status_history in which is stored one row for every order status change and I want to get the last entry for a determined order status.

Tables involved (simplified):

orders
| id     | 
| 1      |  

order_products
| id     | order_id | productid | quantity| 
| 1      | 1        | 1         | 1       | 
| 2      | 1        | 1         | 2       | 
| 3      | 1        | 1         | 2       | 
| 4      | 1        | 1         | 10      |  

order_status_history
| id     | order_id | order_status_id | updatedat              | 
| 1      | 1        | 1               | 2017-05-18 18:45:50    | 
| 2      | 1        | 2               | 2017-05-19 18:45:50    | 
| 3      | 1        | 3               | 2017-05-20 18:45:50    | 
| 4      | 1        | 2               | 2017-05-21 18:45:50    | 
| 5      | 1        | 3               | 2017-05-22 18:45:50    | 

What I exactly need is:

If the last order status is "3" (as "4" means cancelled or anyway not more available), for each order id get only the last entry (based on updatedat column, last means most recent)

This is the query I tried, but in this case returns 2 rows for the same order id and status 3:

SELECT o.id, 
       osh.updatedat, 
       op.quantity 
FROM   orders_products op 
       LEFT JOIN order_status_history osh 
              ON osh.order_id = op.orderid 
       LEFT JOIN orders o 
              ON o.id = op.orderid 
WHERE  op.productid = 1 
       AND (SELECT osh.order_status_id 
            FROM   order_status_history osh 
            WHERE  osh.order_id = o.id 
            ORDER  BY osh.updatedat DESC 
            LIMIT  1) = 3 

Solution

  • I believe that without any entry in the orders table, there cannot be any entry in the orders_products and order_status_history table. So, I would change the order of table join for ease of understand ability. Moreover, there is no need of LEFT JOIN, as we are trying to fetch rows for a specific product (hence, rows should exist in the products table), and at a specific order_status (hence, rows should exist in the status history table). So I will change all the LEFT JOIN in the query to INNER JOIN.

    Now, to get the last updatedat row corresponding to a particular status, we will have to use a Derived Table. In this subquery, we will fetch the latest updatedat value for every order, when status is 3.

    Eventually, we will join this subquery result-set to the main tables appropriately, to get data corresponding to latest updatedat row value.

    Also, it is generally a good practice to shift your WHERE conditions to the join ON clause for ease of understand-ability. Also, in future when you change from INNER JOIN to LEFT JOIN etc, you can easily change without worrying about unnecessary filtering happening due to WHERE.


    Try the following Query: View on DB Fiddle

    SELECT 
      o.id, 
      op.quantity, 
      osh.updatedat 
    FROM 
      orders AS o 
    JOIN order_products AS op 
      ON op.order_id = o.id AND 
         op.productid = 1 
    JOIN order_status_history AS osh 
      ON osh.order_id = o.id 
    JOIN (SELECT order_id, 
                 MAX(updatedat) AS max_updated_at
          FROM order_status_history 
          WHERE order_status_id = 3
          GROUP BY order_id
         ) AS dt 
      ON dt.order_id = o.id AND 
         dt.max_updated_at = osh.updatedat;
    

    Result:

    | id  | quantity | updatedat           |
    | --- | -------- | ------------------- |
    | 1   | 1        | 2017-05-22 18:45:50 |
    | 1   | 2        | 2017-05-22 18:45:50 |
    | 1   | 2        | 2017-05-22 18:45:50 |
    | 1   | 10       | 2017-05-22 18:45:50 |
    

    Other than updatedat values, if you don't need to get any other column(s) from the order_status_history table, you can optimize the query further by getting rid of the Join with the order_status_history table

    Query #2

    SELECT 
      o.id, 
      op.quantity, 
      dt.max_updated_at AS updatedat 
    FROM 
      orders AS o 
    JOIN order_products AS op 
      ON op.order_id = o.id AND 
         op.productid = 1 
    JOIN (SELECT order_id, 
                 MAX(updatedat) AS max_updated_at
          FROM order_status_history 
          WHERE order_status_id = 3
          GROUP BY order_id
         ) AS dt 
      ON dt.order_id = o.id;
    

    Result:

    | id  | quantity | updatedat           |
    | --- | -------- | ------------------- |
    | 1   | 1        | 2017-05-22 18:45:50 |
    | 1   | 2        | 2017-05-22 18:45:50 |
    | 1   | 2        | 2017-05-22 18:45:50 |
    | 1   | 10       | 2017-05-22 18:45:50 |