Search code examples
mysqlsqlunique

SELECT without duplicates unless interupted by other values


Given the following data:

name | temp
-----------
hoi  | 15
hoi  | 15
hoi  | 16
hoi  | 15
hej  | 13
hoi  | 13

I would like to select the data in the given two columns without duplicates, However I do want to keep duplicates that are duplicates if they where interrupted by another value:

name | temp
-----------
hoi  | 15 // selected
hoi  | 15 // ignored duplicate
hoi  | 15 // ignored duplicate
hoi  | 16 // selected
hoi  | 15 // selected because while not being unique it follows a different value
hoi  | 15 // ignored duplicate
hej  | 13 // selected
hoi  | 13 // selected
hoi  | 13 // ignored duplicate
hoi  | 14 // selected
hoi  | 13 // selected because while not being unique it follows a different value

This question was hard to formulate for me given English is not my native tongue, Feel free to edit the question or ask for clarifications.

Edit: There is an id field and a datetime field.

Edit 2: I use mySQL 5.7


Solution

  • Since you are using MySQL 5.7, which doesn't support analytical functions, you will need to use variables to store the values of temp and name, from the previous row:

    SELECT  t.ID,
            t.Name,
            t.Temp
    FROM    (   SELECT  t.*,
                        IF(@temp = t.temp AND @name = t.Name, 1, 0) AS IsDuplicate,
                        @temp:= t.temp,
                        @name:= t.Name
            FROM    YourTable AS t
            CROSS JOIN (SELECT @temp := 0, @name := '') AS v
            ORDER BY t.ID
            ) AS t
    WHERE   t.IsDuplicate = 0
    ORDER BY ID;
    

    Example on DB<>Fiddle

    The key parts are (not in the order in which they appear, but in the order in which it is logical to think about it).

    (1) Initialise the variables, and order by ID (or whatever field(s) you like) to ensure variables are assigned in the correct order

    CROSS JOIN (SELECT @temp := 0, @name := '') AS v
    ORDER BY t.ID
    

    (2) Check if the values stored in the variables matches the current row, and flag with a 1 or a 0

    IIF(@temp = t.temp AND @name = t.Name, 1, 0) AS IsDuplicate
    

    (3) Assign the values of temp and name in the current row to the variables, so they can be checked against the next row:

    @temp:= t.temp,
    @name:= t.Name
    

    (4) Remove duplicates from the final data set:

    WHERE   t.IsDuplicate = 0;
    

    To go one further, you could change the IsDuplicate flag to be a group marker, and use GROUP BY, so you can find out how many records there were in total, while still not displaying duplicates:

    SELECT  MIN(ID) AS FirstID,
            t.Name,
            t.Temp,
            COUNT(*) AS Records,
            MAX(ID) AS LastID
    FROM    (   SELECT  t.*,
                        @group:= IF(@temp = t.temp AND @name = t.Name, @group, @group + 1) AS GroupID,
                        @temp:= t.temp,
                        @name:= t.Name
            FROM    YourTable AS t
            CROSS JOIN (SELECT @temp := 0, @name := '', @group:= 0) AS v
            ORDER BY t.ID
            ) AS t
    GROUP BY t.GroupID, t.Name, t.Temp
    ORDER BY t.GroupID;
    

    Example on DB<>Fiddle

    This may be surplus to requirements, but it can be useful as you are able to extract a lot more information than when just identifying duplicate rows.

    Finally if/when you upgrade to version 8.0 or newer, you will be able to use ROW_NUMBER(), or if you move to any other DBMS that supports ROW_NUMBER() (which is most nowadays), then you can use the following:

    SELECT  MIN(ID) AS FirstID,
            t.Name,
            t.Temp,
            COUNT(*) AS Records,
            MAX(ID) AS LastID
    FROM    (   SELECT  t.*,
                        ROW_NUMBER() OVER(ORDER BY ID) - 
                            ROW_NUMBER() OVER(PARTITION BY Temp, Name ORDER BY ID) AS GroupID
            FROM    YourTable AS t
            ORDER BY t.ID
            ) AS t
    GROUP BY t.GroupID, t.Name, t.Temp
    ORDER BY t.GroupID;
    

    Example on DB<>Fiddle