t-sql date conditional-statements common-table-expression row-number

Conditional Row_Number() for min and maximum date

I ve got a table with data which looks like this:

Table T1

+----+------------+------------+

| ID |   Udate    | last_code  | 
+----+------------+------------+
|  1 | 05/11/2018 | ATTEMPT    |
|  1 | 03/11/2018 | ATTEMPT    |
|  1 | 01/11/2017 | INFO       |
|  1 | 25/10/2016 | ARRIVED    |
|  1 | 22/9/2016  | ARRIVED    |
|  1 | 14/9/2016  |   SENT     | 
|  1 | 1/9/2016   |   SENT     |
+----+------------+------------+
|  2 | 26/10/2016 | RECEIVED   | 
|  2 | 19/10/2016 | ARRIVED    | 
|  2 | 18/10/2016 | ARRIVED    |
|  2 | 14/10/2016 | ANNOUNCED  | 
|  2 | 23/9/2016  | INFO       | 
|  2 | 14/9/2016  | DAMAGE     |
|  2 | 2/9/2016   | SCHEDULED  | 
+----+------------+------------+

Each id has multiple codes at different dates and there is no pattern for them.

Overall I m trying to get the last date and code, but if there is an "ATTEMPT" code, I need to get the first date and that code for each individual ID. Based on the table above, I would get:

+----+------------+------------+

| ID |   Udate    | last_code  | 
|  1 | 03/11/2018 | ATTEMPT    |
|  2 | 26/10/2016 | RECEIVED   | 
+----+------------+------------+

I ve been trying

ROW_NUMBER() OVER (PARTITION BY ID
                                ORDER BY 
                                    (CASE WHEN code = 'ATTEMPT' THEN u_date END) ASC,
                                    (CASE WHEN code_key <> 'ATTEMPT' THEN u_date END) DESC
                                ) as RN

And at the moment I ve been stuck after I use ROW_NUMBER() twice, but can t think of a way to bring them all in the same table.

,ROW_NUMBER() OVER (PARTITION BY id, code order by udate asc) as RN1
,ROW_NUMBER() OVER (PARTITION BY id order by udate desc) AS RN2

I m not very familiar with CTEs and I think it s one of those queries which requires one perhaps..

Thanks.

Solution

I think you have a couple of options before attempting a CTE.

Give these a try, examples below:

DECLARE @TestData TABLE
    (
        [ID] INT
      , [Udate] DATE
      , [last_code] NVARCHAR(100)
    );

INSERT INTO @TestData (
                          [ID]
                        , [Udate]
                        , [last_code]
                      )
VALUES ( 1, '11/05/2018', 'ATTEMPT  ' )
     , ( 1, '11/03/2018', 'ATTEMPT' )
     , ( 1, '11/01/2017', 'INFO' )
     , ( 1, '10/25/2016', 'ARRIVED' )
     , ( 1, '9/22/2016 ', 'ARRIVED' )
     , ( 1, '9/14/2016 ', 'SENT' )
     , ( 1, '9/1/2016  ', 'SENT' )
     , ( 2, '10/26/2016', 'RECEIVED' )
     , ( 2, '10/19/2016', 'ARRIVED' )
     , ( 2, '10/18/2016', 'ARRIVED' )
     , ( 2, '10/14/2016', 'ANNOUNCED' )
     , ( 2, '9/23/2016 ', 'INFO' )
     , ( 2, '9/14/2016 ', 'DAMAGE' )
     , ( 2, '9/2/2016  ', 'SCHEDULED' );

--option 1
--couple of outer apply
--1 - to get the min date for attempt
--2 - to get the max date regardless of the the code
--where clause, using coalesce will pick what date.  Use the date if I have one for code ='ATTEMPT', if not use the max date.
SELECT      [a].*
FROM        @TestData [a]
OUTER APPLY (
                SELECT   [b].[ID]
                       , MIN([b].[Udate]) AS [AttemptUdate]
                FROM     @TestData [b]
                WHERE    [b].[ID] = [a].[ID]
                         AND [b].[last_code] = 'ATTEMPT'
                GROUP BY [b].[ID]
            ) AS [aa]
OUTER APPLY (
                SELECT   [c].[ID]
                       , MAX([c].[Udate]) AS [MaxUdate]
                FROM     @TestData [c]
                WHERE    [c].[ID] = [a].[ID]
                GROUP BY [c].[ID]
            ) AS [cc]
WHERE       [a].[ID] = COALESCE([aa].[ID], [cc].[ID])
            AND [a].[Udate] = COALESCE([aa].[AttemptUdate], [cc].[MaxUdate]);


--use window functions
--Similiar in that we are finding the max Udate and also min Udate when last_code='ATTEMPT'
--Then using COALESCE in the where clause to evaluate which one to use.
--Maybe a little cleaner
SELECT [td].[ID]
     , [td].[Udate]
     , [td].[last_code]
FROM   (
           SELECT [ID]
                , [last_code]
                , [Udate]
                , MAX([Udate]) OVER ( PARTITION BY [ID] ) AS [MaxUdate]
                , MIN(   CASE WHEN [last_code] = 'ATTEMPT' THEN [Udate]
                              ELSE NULL
                         END
                     ) OVER ( PARTITION BY [ID] ) AS [AttemptUdate]
           FROM   @TestData
       ) AS [td]
WHERE  [td].[Udate] = COALESCE([td].[AttemptUdate], [td].[MaxUdate]);

To explain how I got there a little bit, it was primarily base on your requirement:

Overall I m trying to get the last date and code, but if there is an "ATTEMPT" code, I need to get the first date and that code for each individual ID.

So for each ID I needed a way to get:

Minimum Udate for last_code = 'ATTEMPT' per ID - if there was no ATTEMPT we'll get a null
Maximum Udate for all records per ID

If I could determine the above for each record based on ID then my final result set are basically those where the Udate equals my Maximum Udate if the Minimum was null. If the Minimum wasn't null use that instead.

The first option, using 2 outer applies is doing each of the points above.

Minimum Udate for last_code = 'ATTEMPT' per ID - if there was no ATTEMPT we'll get a null:

OUTER APPLY (
                SELECT   [b].[ID]
                       , MIN([b].[Udate]) AS [AttemptUdate]
                FROM     @TestData [b]
                WHERE    [b].[ID] = [a].[ID]
                         AND [b].[last_code] = 'ATTEMPT'
                GROUP BY [b].[ID]
            ) AS [aa]

Outer Apply as I might not have an ATTEMPT record for a given ID, so in those situations it returns NULL.

Maximum Udate for all records per ID:

OUTER APPLY (
                SELECT   [c].[ID]
                       , MAX([c].[Udate]) AS [MaxUdate]
                FROM     @TestData [c]
                WHERE    [c].[ID] = [a].[ID]
                GROUP BY [c].[ID]
            ) AS [cc]

Then the where clause compares what was returned by those to return only the records I want:

    [a].[Udate] = COALESCE([aa].[AttemptUdate], [cc].[MaxUdate]);

I'm using COALESCE to handled and evaluate NULLs. COALESCE will evaluate the fields from left to right and use/return the first non NULL value.

So using this with Udate we can evaluate which Udate value I should use in my filter to satisfy the requirement.

Because if I had an ATTEMPT record field AttemptUdate would have a value and be used in the filter first. If I didn't have an ATTEMPT record AttemptUdate would be NULL so then MaxUdate would be used.

For option 2, similar just going after it a little different.

Minimum Udate for last_code = 'ATTEMPT' per ID - if there was no ATTEMPT we'll get a null:

        MIN(   CASE WHEN [last_code] = 'ATTEMPT' THEN [Udate]
                      ELSE NULL
                 END
             ) OVER ( PARTITION BY [ID] ) AS [AttemptUdate]

Min on Udate, but I use a case statement to evaluate if that records is an ATTEMPT or not. using OVER PARTITION will do that based on how I tell it to partition the data, by ID.

Maximum Udate for all records per ID:

MAX([Udate]) OVER ( PARTITION BY [ID] ) AS [MaxUdate]

Go get me the maximum Udate based on ID, since that's how I told it to partition it.

I do all that in a sub-query to make the where clause easier to work with. Then it's the same as before when filtering:

[td].[Udate] = COALESCE([td].[AttemptUdate], [td].[MaxUdate]);

Using COALESCE to determine which date I should be using and only return the records I want.

With the second option, go a little deeper, If you run just the sub query, you'll see you get for each individual record the 2 main driving points of the requirement:

What's the Max Udate per ID
What's the mint Udate of last_code=ATTEMPT per ID

From there I can just filter on those records satisfying what I was originally looking for, using a COALESCE to simplify my filter.

[td].[Udate] = COALESCE([td].[AttemptUdate], [td].[MaxUdate]);

Use AttemptUdate unless it's NULL then use MaxUdate.