Search code examples
sqljoingroup-bydistinctself-join

How to write a SQL statement to return Distinct records based on X, Y, and Z columns and the first value in table for Column A


I am struggling to wrap my head around this.

I have a data table like this:

TABLE:

Column_A Column_X Column_Y Column_Z
1234 TOON CITY OFF
1235 STAR CARS OFF
1226 STAR CARS OFF
3234 TOON CITY OFF
4435 STAR CARS OFF
5555 STAR CARS OFF
5555 TOON CITY OFF
3333 STAR CARS OFF
1111 STAR CARS OFF
SELECT Distinct
       TABLE.Column_X,
       TABLE.Column_Y,
       TABLE.Column_Z,
  FROM TABLE

Will return me two records

TOON CITY OFF
STAR CARS OFF

But now I want the first instance of Column A in the table to have the results show like:

TOON CITY OFF 1234
STAR CARS OFF 1235

The thing I'm working on doesn't care about which number I put in a form, but it wants one of them. Its actually looking at like 50 distinct values and has 4 that I need a value but just one of them.

I've tried joining the table back to itself based on its distinct and select the TOP 1 record and using table2.Column_A for the results but that doesn't seem to work.

Group By seems like it might be the solution but I can't get that to work either.

SELECT Column_X, Column_Y, Column_Z, Top 1(Column_A)???
  FROM Table
 GROUP BY Column_X, Column_Y, Column_Z

Solution

  • I would prefer here to simply fetch for example the one with the lowest value in Column_A:

    SELECT Column_X, Column_Y, Column_Z, MIN(Column_A) AS Column_A
    FROM yourtable
    GROUP BY Column_X, Column_Y, Column_Z;
    

    That's easy and short. It will produce this outcome based on your sample data:

    Column_X Column_Y Column_Z Column_A
    STAR CARS OFF 1111
    TOON CITY OFF 1234

    I think this is better because mostly it doesn't matter which entry appears "first" in a table, but we prefer to select such entries having a lowest or highest value.

    Which entry occurs as "first" in a table is usually random and means nothing unless there is an id or sequence which is used as "sorting column".

    But anyway, if we really want to exactly choose the "first" two entries, we can use ROW_NUMBER:

    WITH sub AS
    (SELECT 
      ROW_NUMBER() OVER(PARTITION BY Column_X,  Column_Y, Column_Z 
      ORDER BY Column_X) AS RowNr,
      Column_A, Column_X,Column_Y,Column_Z
    FROM yourtable)
    SELECT Column_X,
           Column_Y,
           Column_Z,
           Column_A
      FROM sub
    WHERE RowNr = 1
    ORDER BY Column_A;
    

    That leads to worse readability, but will produce exactly the same result which you have shown in your question:

    Column_X Column_Y Column_Z Column_A
    TOON CITY OFF 1234
    STAR CARS OFF 1235

    Try out here: db<>fiddle