Search code examples
sqlsql-serversql-server-2000

sql union remove "semi-duplicates"


I'm doing a union like so

select name, price from products where project = 10  // prio 1
union
select name, price from products where customer = 5  // prio 2
union
select name, price from products where standard = 9  // prio 3

edit: changed the where-clause to make it a bit more complicated

this will typically give me back

+-----+------+-----------+--------------+
|(NO) | name |  price    |  (prio)      |
+-----+------+-----------+--------------+
|  1  |  a   |    10     |  (1)         |
|  2  |  b   |     5     |  (1)         |

|  3  |  a   |    13     |  (2)         |
|  4  |  b   |     2     |  (2)         |

|  5  |  a   |     1     |  (3)         |
|  6  |  b   |     5     |  (3)         |
|  7  |  c   |     3     |  (3)         |
+-----+------+-----------+--------------+

I understand that e.g. row no 1 and 3 are not duplicates and will not be removed by the union statement. However, this is exactly what I want to do. That is, if a name (e.g. "a") gets returned by the first select statement (prio 1) I don't want any other "a":s to get into the result set from the select statements of higher priority.

i.e, I want this:

+-----+------+-----------+--------------+
|(NO) | name |  price    |  (prio) |
+-----+------+-----------+--------------+
|  1  |  a   |    10     |  (1)         |
|  2  |  b   |     5     |  (1)         |

|  7  |  c   |     3     |  (3)         |
+-----+------+-----------+--------------+

Is this possible?

I tried using group by but this requires me to use a MIN, MAX, AVG etc. on the price which I don't want to do, i.e:

select name, avg(price) from (...original query...) group by name
// this is not ok since I donnot want the avg price, I want the "first" price

I am using MS SQL 2000. Can I use something like first(..) as aggregate function in the group by? When trying this, I get an error:

select name, first(price) from (...original query...) group by name
// error: 'first' is not a recognized built-in function name.

Thanks!


Solution

  • For SQL Server 2005+:

    WITH records
    AS
    (
        SELECT  name, price, prio,
                ROW_NUMBER() OVER (PARTITION BY name
                                    ORDER BY prio ASC) rn
        FROM    products
    )
    SELECT  Name, Price
    FROM    records
    WHERE   rn = 1
    

    Try this for SQL Server 2000:

    SELECT  a.*
    FROM    products a
            INNER JOIN
            (
                SELECT  name, MIN(prio) min_prio
                FROM    products 
                WHERE   prio IN (1,2,3)
                GROUP   BY  name
            ) b ON a.name = b.name AND
                    a.prio = b.min_prio
    

    for better performance, add a compound index on column (name, prio).