Search code examples
sql-servergroupinggaps-and-islandsdense-rankgranularity

How to aggregate to group level, order and enumerate groups chronologically? (in SQL Server)


Within a hospital encounter, a patient may be administered several different formulations of a medication, as shown here:

Encounter Medication Administration Adm_Num
1 A 8/31/21 11:33 AM 1
1 B 8/31/21 6:25 PM 2
1 C 9/1/21 8:55 AM 3
1 D 9/1/21 10:00 PM 4
1 B 9/2/21 11:27 AM 5
1 B 9/2/21 10:00 PM 6
1 B 9/3/21 6:15 AM 7
1 B 9/3/21 3:30 PM 8
1 D 9/3/21 8:30 PM 9

The task: For each encounter, I need to enumerate each formulation of the medication, like this:

Encounter Medication Administration Adm_Num Formulation
1 A 8/31/21 11:33 AM 1 1
1 B 8/31/21 6:25 PM 2 2
1 C 9/1/21 8:55 AM 3 3
1 D 9/1/21 10:00 PM 4 4
1 B 9/2/21 11:27 AM 5 5
1 B 9/2/21 10:00 PM 6 5
1 B 9/3/21 6:15 AM 7 5
1 B 9/3/21 3:30 PM 8 5
1 D 9/3/21 8:30 PM 9 6

I need to preserve the administration-level granularity in order to report on those data (and I'm in good shape there).

One strategy I tried was a CTE to find a medication-level value for each medication, such as the first administration instant, then apply that to the more granular administration-level data, like this:

WITH f as (
SELECT a.Encounter, a.Medication, MIN(Administration) as First_Adm,
 DENSE_RANK() OVER (PARTITION Encounter, Medication ORDER BY Administration) as Formulation
FROM a
GROUP BY a.Encounter, a.Medication
)
SELECT a.Encounter, a.Medication, a.Administration, a.Adm_Num, f.[First_Adm], f.Formulation
FROM a
INNER JOIN f ON (a.Encounter = f.Encounter AND a.Medication = f.Medication)
GROUP BY a.Encounter, a.Medication, a.Administration
ORDER BY a.Encounter, a.Medication, a.Administration

That worked well for the first few medications with single administrations, but it mishandled the reappearance of Medications B and D later in the encounter; it recognized Medication B from before, gave it the earlier MIN(Administration), and mislabeled it Formulation 2, as shown here:

Encounter Medication Administration Adm_Num First_Adm Formulation
1 A 8/31/21 11:33 AM 1 8/31/21 11:33 AM 1
1 B 8/31/21 6:25 PM 2 8/31/21 6:25 PM 2
1 C 9/1/21 8:55 AM 3 9/1/21 8:55 AM 3
1 D 9/1/21 10:00 PM 4 9/1/21 10:00 PM 4
1 B 9/2/21 11:27 AM 5 8/31/21 6:25 PM 2
1 B 9/2/21 10:00 PM 6 8/31/21 6:25 PM 2
1 B 9/3/21 6:15 AM 7 8/31/21 6:25 PM 2
1 B 9/3/21 3:30 PM 8 8/31/21 6:25 PM 2
1 D 9/3/21 8:30 PM 9 9/1/21 10:00 PM 4

This is close, but not quite there. The remaining challenge is to account for repeated formulations.

How can I group the administration-level data by medication formulation groups, order the formulations chronologically (making use of the time data), and assign a sequential value to each formulation, even if some are repeated?

Thanks in advance for your help.


Solution

  • This is a type of "gaps and islands" problem.

    One method is to first use lag or lead to check the adjacent row to identify where the data (Medication) changes, followed by summing these values for all preceding rows for each row in the results to generate the desired sequence:

    with c as (
        select *, 
          case when 
            Lag(medication) over (partition by encounter order by Administration) = medication 
          then 0 else 1 end Changed
        from t
    )
    select Encounter, Medication, Administration, Adm_Num,
      Sum(changed) over(partition by Encounter order by Administration) Formulation
    from c;