Search code examples
sql-serverstored-proceduresdata-warehouse

T-SQL logic for roll up and group by


I have a question to collapse or roll up data based on the logic below. How can I implement it?

The logic that allows episodes to be condensed into a single continuous care episode is a discharge code of 22 followed by an admission code of 4 on the same day.

continuous care implementation continuous care implementation update

EPN--is a business_key.

episode_continuous_care_key is an artificial key that can be a row number function.

Below is the table structure.

            drop table #source
            CREATE TABLE #source(patidid varchar(20),epn int,preadmitdate datetime,adminttime varchar(10),
            admitcode varchar(10),datedischarge datetime,disctime varchar(10),disccode varchar(10))
            INSERT INTO #source VALUES
             (1849,1,'4/23/2020','7:29',1,'7/31/2020','9:03',22)
            ,(1849,2,'7/31/2020','11:00',4,'7/31/2020','12:09',22)
            ,(1849,3,'7/31/2020','13:10',4,'8/24/2020','10:36',10)
            ,(1849,4,'8/26/2020','12:25',2,null,null,null)
            ,(1850,1,'4/23/2020','7:33',1,'6/29/2020','7:30',22)
            ,(1850,2,'6/29/2020','9:35',4,'7/8/2020','10:51',7)
            ,(1850,3,'7/10/2020','11:51',3,'7/29/2020','9:12',7)
            ,(1850,4,'7/31/2020','11:00',2,'8/6/2020','10:24',22)
            ,(1850,5,'8/6/2020','12:26',4,null,null,null)
            ,(1851,1,'4/23/2020','7:35',1,'6/24/2020','13:45',22)
            ,(1851,2,'6/24/2020','15:06',4,'9/24/2020','15:00',2)
            ,(1851,3,'12/4/2020','8:59',0,null,null,null)
            ,(1852,1,'4/23/2020','7:37',1,'7/6/2020','11:15',20)
            ,(1852,2,'7/8/2020','10:56',0,'7/10/2020','11:46',2)
            ,(1852,3,'7/10/2020','11:47',2,'7/28/2020','13:16',22)
            ,(1852,4,'7/28/2020','15:17',4,'8/4/2020','11:37',22)
            ,(1852,5,'8/4/2020','13:40',4,'11/18/2020','15:43',2)
            ,(1852,6,'12/2/2020','15:23',2,null,null,null)
            ,(1853,1,'4/23/2020','7:40',1,'7/1/2020','8:30',22)
            ,(1853,2,'7/1/2020','14:57',4,'12/4/2020','12:55',7)
            ,(1854,1,'4/23/2020','7:44',1,'7/31/2020','13:07',20)
            ,(1854,2,'8/3/2020','16:30',0,'8/5/2020','9:32',2)
            ,(1854,3,'8/5/2020','10:34',2,'8/24/2020','8:15',22)
            ,(1854,4,'8/24/2020','10:33',4,'12/4/2020','7:30',22)
            ,(1854,5,'12/4/2020','9:13',4,null,null,null)

Solution

  • That Excel sheet image says little about your database design so I invented my own version that more or less resembles your image. With a proper database design the first step of the solution should not be required...

    1. Unpivot timestamp information so that admission timestamp and discharge timestamps become one column.
      I used a common table expression Log1 for this action.
    2. Use the codes to filter out the start of the continuous care periods. Those are the admissions, marked with Code.IsAdmission = 1 in my database design.
      Also add the next period start as another column by using the lead() function.
      These are all the actions from Log2.
    3. Add a row number as continuous care key.
      Using the next period start date, find the current continuous period end date with a cross apply.
      Replace empty period end dates with the current date using the coalesce() function.
      Calculate the difference as the continuous care period duration with the datediff() function.

    Sample data

    create table Codes
    (
      Code int,
      Description nvarchar(50),
      IsAdmission bit
    );
    
    insert into Codes (Code, Description, IsAdmission) values
    ( 1, 'First admission', 1),
    ( 2, 'Re-admission', 1),
    ( 4, 'Campus transfer IN', 0),
    (10, 'Trial visit', 0),
    (22, 'Campus transfer OUT', 0);
    
    create table PatientLogs
    (
      PatientId int,
      AdmitDateTime smalldatetime,
      AdmitCode int,
      DischargeDateTime smalldatetime,
      DischargeCode int
    );
    
    insert into PatientLogs (PatientId, AdmitDateTime, AdmitCode, DischargeDateTime, DischargeCode) values
    (1849, '2020-04-23 07:29', 1, '2020-07-31 09:03', 22),
    (1849, '2020-07-31 11:00', 4, '2020-07-31 12:09', 22),
    (1849, '2020-07-31 13:10', 4, '2020-08-24 10:36', 10),
    (1849, '2020-08-26 12:25', 2, null, null);
    

    Solution

    with Log1 as
    (
      select updt.PatientId,
             case updt.DateTimeType
               when 'AdmitDateTime' then updt.AdmitCode
               when 'DischargeDateTime' then updt.DischargeCode
             end as Code,
             updt.LogDateTime,
             updt.DateTimeType
      from PatientLogs pl
      unpivot (LogDateTime for DateTimeType in (AdmitDateTime, DischargeDateTime)) updt
    ),
    Log2 as (
      select l.PatientId,
             l.Code,
             l.LogDateTime,
             lead(l.LogDateTime) over(partition by l.PatientId order by l.LogDateTime) as LogDateTimeNext
      from Log1 l
      join Codes c
        on c.Code = l.Code
      where c.IsAdmission = 1
    )
    select la.PatientId,
           row_number() over(partition by la.PatientId order by la.LogDateTime) as ContCareKey,
           la.LogDateTime as AdmitDateTime,
           coalesce(ld.LogDateTime, convert(smalldatetime, getdate())) as DischargeDateTime,
           datediff(day, la.LogDateTime, coalesce(ld.LogDateTime, convert(smalldatetime, getdate()))) as ContStay
    from Log2 la -- log admission
    outer apply ( select top 1 l1.LogDateTime
                  from Log1 l1
                  where l1.PatientId = la.PatientId
                    and l1.LogDateTime < la.LogDateTimeNext
                  order by l1.LogDateTime desc ) ld -- log discharge
    order by la.PatientId,
             la.LogDateTime;
    

    Result

    PatientId  ContCareKey  AdmitDateTime     DischargeDateTime  ContStay
    ---------  -----------  ----------------  -----------------  --------
    1849       1            2020-04-23 07:29  2020-08-24 10:36   123
    1849       2            2020-08-26 12:25  2021-02-03 12:49   161
    

    Fiddle to see things in action with intermediate results.