Search code examples
sqluniquedbase

I need a unique subset of data from my table using SQL


I am working with a Dbase table via SQL commands. I have the following free table to work with.

import1

account, phone, subaccount, locationid

I need the first occurrence of each uinique combination of account and phone. No one field in the table is unique. I can get a partial subset of data using the distinct qualifier but i need the other fields that relate to the record its selecting also.

Can this be done? Thanks

Edit: I found out that I need to qualify the selected records must stay intact.

Example:

Import1

001 123-4567 123 0110

001 123-0001 234 0220

001 123-4567 456 0011

002 222-2222 010 0110

003 333-3333 333 0330

should return

import1

001 123-4567 123 0110

001 123-0001 234 0220

002 222-2222 010 0110

003 333-3333 333 0330

That's my intent here.


Solution

  • First, no matter how you slice it, you need a unique key. It can be a combination of columns, but you absolutely must have some means of uniquely identifying each row. Without that, your best recourse is to add such a constraint. In addition to that constraint, this type of request is best served by also having a single unique column. However, if you do not have such a column, you can create it from the columns that do comprise the unique key by concatenating them together:

    Select T1.account, T1.phone, T1.subaccount, T1.locationId....
    From "Table" As T1
    Where (Cast(T1.UniqueCol1 As varchar(100)) 
        + "|" + Cast(T2.UniqueCol2 As varchar(100)
        ... ) = (
                Select Min( Cast(T2.UniqueCol1 As varchar(100))
                            + "|" + Cast(T2.UniqueCol2 As varchar(100))
                            ... 
                            ) 
                From "Table" As T2
                Where T2.account = T1.account
                    And T2.phone = T1.phone
                )
    

    In this case UniqueCol1 represents one of the columns of the unique key, UniqueCol2 represents the next and so on. This will not be a fast query by any means. Having a single column guaranteed to be unique would make this problem significantly simpler. With that, you can do something akin to Mike M's solution:

    Select T1.account, T1.phone, T1.subaccount, T1.locationId....
    From "Table" As T1
    Where UniqueCol = (
                        Select Min( T2.UniqueCol ) 
                        From "Table" As T2
                        Where T2.account = T1.account
                            And T2.phone = T1.phone
                        )
    

    An important point to be made here is that in both solutions above "first" is being determined simply by the lowest sorted key value found. If "first" needs to be determined by something else, like a datetime column, you need to mention that your in your post.

    EDIT

    Given your addition that this is an import, the simplest solution is to add an auto-incrementing column to your staging table. In SQL Server this would be an IDENTITY column but other database products have an equivalent. If you do that, then the last solution I presented above will do the trick (simply replace UniqueCol with the name of your Identity column).