Search code examples
sqlsql-servert-sqlrowpartitioning

select row with most information in conjunction with PARTITION BY


I have a table with duplicates and I want to partition by the ID and select only the row with the most information (where the most fields contain values).

+----+------+------+-------+---------+-------+
| ID | Name | City |  Zip  | Address | Phone |  
+----+------+------+-------+---------+-------+
|  1 | Joe  |      |       |         |       |  
|  1 | Joe  | DC   | 11111 |         |       |  
| 2  | Pete | NY   |       |         |       |  
|  2 | Pete | NY   | 10000 |         | 202-  |  
|  3 | Max  |      |       |         |       |  
| 3  | Max  |      |       |         |       |  
|  4 | Sean | MIA  |       |         |       |  
|  4 | Sean | MIA  |       | 1 blvd  |       |  
|  4 | Sean |      | 12345 |         | 305-  |  
|    |      |      |       |         |       |  
+----+------+------+-------+---------+-------+

This is my goal:

+----+------+------+-------+---------+-------+---------+
| ID | Name | City |  Zip  | Address | Phone | Row_num |
+----+------+------+-------+---------+-------+---------+
|  1 | Joe  | DC   | 11111 |         |       |       1 |
|  2 | Pete | NY   | 10000 |         | 202-  |       1 |
| 3  | Max  |      |       |         |       |       1 |
|  4 | Sean | MIA  |       | 1 blvd  |       |       1 |
|    |      |      |       |         |       |         |
+----+------+------+-------+---------+-------+---------+

For Joe it is obvious that I want the second row with where the city and zip information is given.

For Pete I also want to display the second record because it contains more information.

For Max it doesn't matter what row I choose because both records have the same values.

For Sean I can either take the second or third row because the second record has 3 fields which contain value (name, city, address) and the third record has also three filled fields (name, zip, phone). So It doesn't matter which record I want to select from Sean.



How can I partition my table and select the row with the most information about each person?


Solution

  • If the columns are all strings, you can simplify the logic using apply:

    select t.*
    from (select t.*,
                 row_number() over (partition by t.id order by v.cnt desc) as seqnum
          from t cross apply
               (select count(*)
                from (values (name), (city), (zip), (address), (phone)) v(col)
                where col is not null
               ) v(cnt)
        ) t
    where seqnum = 1;
    

    If you want to adapt this for empty strings, you can change the where to where col is not null and col <> ''.