Search code examples
sqlsql-server-2008t-sqlcursor

Nasty SQL query: is there a way I can find first and last rows in a grouped set without a cursor?


I have data that looks like this:

data sample

What I need to do is, for records having the same ClientId, I need to group consecutive rows (using CpId) where PlaceId is not null, and find the first and last row in each group so that I can retrieve the DateAdmitted value from the first row and the DateDischarged value from the last row. So, the above data needs to be organized like this and then filtered for the values I need:

enter image description here

Using the above example, I would want the following based on ClientId:

ClientId    FirstCpIdInSet    DateAdmitted    LastCpIdInSet    DateDischarged
-----------------------------------------------------------------------------
1967        NULL              NULL            NULL             NULL
1983        45                1986-12-29      45               1987-10-09
1983        47                1990-10-01      49               2009-04-12
1983        52                2009-08-31      52               2009-11-30
1988        62                1997-12-15      65               2000-01-07

ClientId 1967 could be excluded from the result set, since it never has a row where PlaceId is not null. A couple of other things to note:

  • This is taken from a temp table that is created with CpId as the IDENTITY, and the table is populated with a strict ORDER BY, so CpId is sequential in the order needed.
  • For those rows that have PlaceId and are consecutive for a single ClientId, the DateAdmitted should equal the DateDischarged in the previous row.

I'd really like to be able to do this without a cursor, if possible, but after puzzling on it for two days I just can't figure it out. This is on SQL Server 2008 R2.


Solution

  • You don't say what you are basing first and last on. Let me assume it is CPID. You can do this with ranking functions:

    select ClientID, PlaceId,
           max(CpID) as max(CPId),
           min(case when seqnumasc = 1 then DateAdmitted end) as DateAdmitted,
           max(case when seqnumdesc = 1 then DateDischarged end) as DateDischarged
    from (select t.*,
                 row_number() over (partition by clientID, placeID order by cpid) as seqnumasc
                 row_number() over (partition by clientID, placeID order by cpid desc) as seqnumdesc
          from t
         ) t
    where placeID is not null
    group by ClientID, placeID
    

    This puts in sequence nubmers to determine the first and last rows in each group. However, why can't you just use min and max on date addmited and discharged?

    Based on enhanced information . . .

    Now the question appears to be to define the "sets" of records according to the following conditions:

    • Consecutive CPIDs
    • Same client, same company
    • Place not null

    If so, the following will give you a "set id". This uses a trick for bringing together consecutive values, based on subtracting a sequential number from the CPID. This difference is a constant for consecutive values, providing a set id.

    select clientid, setid,
           min(DateAdmitted) as DateAdmitted,
           max(DateDischarged) as DateDischarged,
           min(cpid) as minCPID,
           max(cpid) as maxCPID
    from (select clientid, setid, cpid,
                 row_number() over (partition by clientid, setid order by cpid) as seqnum,
                 count(*) over (partition by clientid, setid) as setsize
          from (select t.*,
                       (cpid - row_number() over (partition by clientid order by cpid)
                       ) as setid
                from t
                where PlaceID is not NULL
               ) t
        ) t
    group by clientid, setid