Search code examples
sql-servert-sqlrow-number

ROW_NUMBER to generate on PARTITION with duplicate Order By value


I am accessing a view in SQL Server that is getting stored as following from an application

SELECT *
FROM
    (
        VALUES 
            (1, 'Open', { ts '2020-01-10 23:56:12' }), 
            (1, 'Started', { ts '2020-01-10 23:56:12' }), 
            (1, 'Hold', { ts '2020-01-10 23:56:12' }), 
            (1, 'Sent', { ts '2020-01-11 15:33:22' }), 
            (1, 'Complete', { ts '2020-01-11 15:36:22' }), 
            (1, 'Closed', { ts '2020-01-13 16:43:33' }), 
            (2, 'Open', { ts '2020-02-22 06:43:33' }), 
            (2, 'Started', { ts '2020-02-22 06:43:33' }), 
            (2, 'Clarify', { ts '2020-03-23 08:33:53' }), 
            (2, 'Closed', { ts '2020-03-24 08:33:53' })
        ) t (WO, WO_Status, WO_Status_Date)

This generates the following

| WO | WO_Status | WO_Status_Date          |
+----+-----------+-------------------------+
| 1  | Open      | 2020-01-10 23:56:12.000 |
| 1  | Started   | 2020-01-10 23:56:12.000 |
| 1  | Hold      | 2020-01-10 23:56:12.000 |
| 1  | Sent      | 2020-01-11 15:33:22.000 |
| 1  | Complete  | 2020-01-11 15:36:22.000 |
| 1  | Closed    | 2020-01-13 16:43:33.000 |
| 2  | Open      | 2020-02-22 06:43:33.000 |
| 2  | Started   | 2020-02-22 06:43:33.000 |
| 2  | Clarify   | 2020-03-23 08:33:53.000 |
| 2  | Closed    | 2020-03-24 08:33:53.000 |

All I want is to create a row number/ WO as per the inherent order of the rows.

So my desired output is

| WO | WO_Status | WO_Status_Date          | Order |
+----+-----------+-------------------------+-------+
| 1  | Open      | 2020-01-10 23:56:12.000 | 1     |
| 1  | Started   | 2020-01-10 23:56:12.000 | 2     |
| 1  | Hold      | 2020-01-10 23:56:12.000 | 3     |
| 1  | Sent      | 2020-01-11 15:33:22.000 | 4     |
| 1  | Complete  | 2020-01-11 15:36:22.000 | 5     |
| 1  | Closed    | 2020-01-13 16:43:33.000 | 6     |
| 2  | Open      | 2020-02-22 06:43:33.000 | 1     |
| 2  | Started   | 2020-02-22 06:43:33.000 | 2     |
| 2  | Clarify   | 2020-03-23 08:33:53.000 | 3     |
| 2  | Closed    | 2020-03-24 08:33:53.000 | 4     |

To get to this, I tried the following

SELECT
    a.*,
    ROW_NUMBER() OVER (PARTITION BY (a.[WO]) ORDER BY a.[WO_Status_Date] ASC) AS [Natural Order]
FROM
    (SELECT *
     FROM
         (VALUES 
            (1, 'Open', { ts '2020-01-10 23:56:12' }), 
            (1, 'Started', { ts '2020-01-10 23:56:12' }), 
            (1, 'Hold', { ts '2020-01-10 23:56:12' }), 
            (1, 'Sent', { ts '2020-01-11 15:33:22' }), 
            (1, 'Complete', { ts '2020-01-11 15:36:22' }), 
            (1, 'Closed', { ts '2020-01-13 16:43:33' }), 
            (2, 'Open', { ts '2020-02-22 06:43:33' }), 
            (2, 'Started', { ts '2020-02-22 06:43:33' }), 
            (2, 'Clarify', { ts '2020-03-23 08:33:53' }), 
            (2, 'Closed', { ts '2020-03-24 08:33:53' })
        ) t (WO, WO_Status, WO_Status_Date)
    ) a

which generates this result:

| WO | WO_Status | WO_Status_Date          | Natural Order |
+----+-----------+-------------------------+---------------+
| 1  | Started   | 2020-01-10 23:56:12.000 | 1             |
| 1  | Hold      | 2020-01-10 23:56:12.000 | 2             |
| 1  | Open      | 2020-01-10 23:56:12.000 | 3             |
| 1  | Sent      | 2020-01-11 15:33:22.000 | 4             |
| 1  | Complete  | 2020-01-11 15:36:22.000 | 5             |
| 1  | Closed    | 2020-01-13 16:43:33.000 | 6             |
| 2  | Started   | 2020-02-22 06:43:33.000 | 1             |
| 2  | Open      | 2020-02-22 06:43:33.000 | 2             |
| 2  | Clarify   | 2020-03-23 08:33:53.000 | 3             |
| 2  | Closed    | 2020-03-24 08:33:53.000 | 4             |

ORDER BY is overwriting the inherent order for rows with duplicate values. Is there a way to get passed this get the desired result.


Solution

  • First, you should know that when it comes to relational databases, there's no such thing as the “Natural order”. It's a misconception.
    In fact, it's such a common misconception I've written a blog post about it because even though it's refuted so many times by so many professionals in so many websites, there are still way too many developers that falls victim to this misconception.

    Now, having said that, I assume the different statuses should be ordered by business logic order rules - and that can be done using either a table for statuses that contains a column to sort them by, or by using a case expression in the order by clause.

    This SQL statement will give you the results you want (Assuming I've got the order of the different statuses correctly):

    SELECT  WO, 
            WO_Status, 
            WO_Status_Date,
            ROW_NUMBER() OVER(PARTITION BY WO ORDER BY 
                WO_Status_Date, 
                CASE WO_Status 
                    WHEN 'Open' Then 1
                    WHEN 'Started' Then 2
                    WHEN 'Hold' Then 3
                    WHEN 'Clarify' Then 4
                    WHEN 'Sent' Then 5
                    WHEN 'Complete' Then 6
                    WHEN 'Closed' Then 7
                END) As [Natural Order]
    FROM
    (
        VALUES 
            (1, 'Open', { ts '2020-01-10 23:56:12' }), 
            (1, 'Started', { ts '2020-01-10 23:56:12' }), 
            (1, 'Hold', { ts '2020-01-10 23:56:12' }), 
            (1, 'Sent', { ts '2020-01-11 15:33:22' }), 
            (1, 'Complete', { ts '2020-01-11 15:36:22' }), 
            (1, 'Closed', { ts '2020-01-13 16:43:33' }), 
            (2, 'Open', { ts '2020-02-22 06:43:33' }), 
            (2, 'Started', { ts '2020-02-22 06:43:33' }), 
            (2, 'Clarify', { ts '2020-03-23 08:33:53' }), 
            (2, 'Closed', { ts '2020-03-24 08:33:53' })
        ) t (WO, WO_Status, WO_Status_Date)