Search code examples
mysqlsql

SQL Query Still having duplicates after group by


SELECT *
FROM `eBayorders`
WHERE (`OrderIDAmazon` IS NULL
       OR `OrderIDAmazon` = "null")
  AND `Flag` = "True"
  AND `TYPE` = "GROUP"
  AND (`Carrier` IS NULL
       OR `Carrier` = "null")
  AND LEFT(`SKU`, 1) = "B"
  AND datediff(now(), `TIME`) < 4
  AND (`TrackingInfo` IS NULL
       OR `TrackingInfo` = "null")
  AND `STATUS` = "PROCESSING"
GROUP BY `Name`,
         `SKU`
ORDER BY `TIME` ASC LIMIT 7

I am trying to make sure that none of the names and skus will show up in the same result. I am trying to group by name and then sku, however I ran into the problem where a result showed up that has the same name and different skus, which I dont want to happen. How can I fix this query to make sure that there is always distinct names and skus in the result set?!

For example say I have an Order:

Name: Ben Z, SKU : B000334, oldest
Name: Ben Z, SKU : B000333, second oldest
Name: Will, SKU: B000334, third oldest
Name: John, SKU: B000036, fourth oldest

The query should return only:
Name: Ben Z, SKU : B000334, oldest
Name: John, SKU: B000036, fourth oldest

This is because all of the Names should only have one entry in the set along with SKU.


Solution

  • SELECT T1.*
    FROM eBayorders T1
    JOIN
      ( SELECT `Name`,
               `SKU`,
               max(`TIME`) AS MAX_TIME
       FROM eBayorders
       WHERE (`OrderIDAmazon` IS NULL OR `OrderIDAmazon` = "null") AND `Flag` = "True" AND `TYPE` = "GROUP" AND (`Carrier` IS NULL OR `Carrier` = "null") AND LEFT(`SKU`, 1) = "B" AND datediff(now(), `TIME`) < 4 AND (`TrackingInfo` IS NULL OR `TrackingInfo` = "null") AND `STATUS` = "PROCESSING"
       GROUP BY `Name`,
                `SKU`) AS dedupe ON T1.`Name` = dedupe.`Name`
    AND T1.`SKU` = dedupe.`SKU`
    AND T1.`Time` = dedupe.`MAX_TIME`
    ORDER BY `TIME` ASC LIMIT 7
    

    Your database platform should have complained because your original query had items in the select list which were not present in the group by (generally not allowed). The above should resolve it.

    An even better option would be the following if your database supported window functions (MySQL doesn't, unfortunately):

    SELECT *
    FROM
      ( SELECT *,
               row_number() over (partition BY `Name`, `SKU`
                                  ORDER BY `TIME` ASC) AS dedupe_rank
       FROM eBayorders
       WHERE (`OrderIDAmazon` IS NULL OR `OrderIDAmazon` = "null") AND `Flag` = "True" AND `TYPE` = "GROUP" AND (`Carrier` IS NULL OR `Carrier` = "null") AND LEFT(`SKU`, 1) = "B" AND datediff(now(), `TIME`) < 4 AND (`TrackingInfo` IS NULL OR `TrackingInfo` = "null") AND `STATUS` = "PROCESSING" ) T
    WHERE dedupe_rank = 1
    ORDER BY T.`TIME` ASC LIMIT 7