Search code examples
mysqlgroup-concat

group_concat to segregate overlapping dates and combine names


Let me explain the current situation:
Table structure and overview of data:
I have a table with patient_ids, drug_name, episode_no for each drug and their start and end dates. This is a snapshot of only a single patient. So this patient is using 2 drugs X and Y. The first episode for drug X starts on 2013-01-22 and ends on 2013-04-22. Similarly, since the second drug is also drug X, it is the second episode. When the drug changes, it is considered as the first episode for the the new drug and so on.

PATIENT_ID DRUG EPISODE EPISODE_START EPISODE_END
773 X 1 2013-01-22 00:00:00 2013-04-22 00:00:00
773 X 2 2013-06-02 00:00:00 2014-03-12 00:00:00
773 Y 1 2013-10-28 00:00:00 2014-01-22 00:00:00

Requirement:
I need to create drug regimens for each of the patients. I'll explain how: If we plot these data in a timeline, we will have something like this:

2013-01-22              2013-04-22
|------------X--------------|
2013-06-02                             2014-03-12
|------------------X-------------------|
2013-10-28              2014-01-22
|----------Y------------|

Now the regimens need to be created in the following order:

2013-01-22              2013-04-22
|------------X--------------|
2013-06-02                             2014-03-12
|------X-----|---------X+Y------------|-----X------|
2013-10-28              2014-01-22
|----------X+Y-----------|

In this situation, the regimens will be combination of drugs which are used in a particular period of time.
Regimen 1: Drug X from dates 2013-01-22 to 2013-04-22,
Regimen 2: Drug X from dates 2013-06-22 to 2013-10-28,
Regimen 3: Drug X+Y from dates 2013-10-28 to 2014-01-22, and finally
Regimen 4: Drug X from dates 2014-01-22 to 2014-03-12.

Finally, in the end the table would be something like this:

PATIENT_ID DRUG_TAKEN REGIMEN_NO REGIMEN_START REGIMEN_END
773 X 1 2013-01-22 00:00:00 2013-04-22 00:00:00
773 X 2 2013-06-02 00:00:00 2013-10-28 00:00:00
773 X+Y 3 2013-10-28 00:00:00 2014-01-22 00:00:00
773 X 4 2014-01-22 00:00:00 2014-03-12 00:00:00

NOTE: There is one date in the middle: 2013-04-22 to 2013-06-02, where there were no drugs which were taken by the patient.

I don't understand how to use group_concat to segregate the dates and combine the drugs. Can anyone please help me on this. Thank you in advance.


Solution

  • I'm using an archaic version of MySQL, so you'll need to update this to take advantage of the tools available in 8+, in order to construct an equivalent to my calendar table, and the regimen numbering...

    DROP TABLE IF EXISTS  my_table;
    
    CREATE TABLE my_table
    (PATIENT_ID INT NOT NULL
    ,DRUG CHAR(1) NOT NULL
    ,EPISODE INT NOT NULL
    ,EPISODE_START DATE NOT NULL
    ,EPISODE_END DATE NULL
    ,PRIMARY KEY(patient_id,drug,episode)
    );
    
    INSERT INTO my_table VALUES
    (773    ,'X',1,'2013-11-28','2013-12-12'),
    (773    ,'X',2,'2013-12-20','2014-01-06'),
    (773    ,'Y',1,'2013-12-28','2014-01-03');
    

    Query...

    SELECT patient_id
         , MIN(dt) r_start
         , MAX(dt) r_end
         , drugs
         , i regimen
      FROM
        (
        SELECT dt
         , patient_id
         , drugs
         , CASE WHEN @prev_patient = patient_id 
                THEN CASE WHEN @prev_drugs = drugs 
                          THEN CASE WHEN @prev_dt = dt - INTERVAL 1 DAY 
                                    THEN @i:=@i 
                                    ELSE @i:=@i+1 END
                          ELSE @i:=@i+1 END
                ELSE @i:=@i+1 END i
         , @prev_patient := patient_id
         , @prev_drugs := drugs
         , @prev_dt := dt
          FROM 
          (
           SELECT dt
                       , x.patient_id
                       , GROUP_CONCAT(drug ORDER BY drug) drugs
                    FROM dates 
                    JOIN my_table x
                      ON dt BETWEEN episode_start AND episode_end
                   GROUP
                      BY dt
                       , patient_id
                   ORDER
                      BY patient_id, dt
            ) n JOIN (SELECT @prev_patient := null,@prev_drugs := null,@prev_dt := null,@i:=0) vars
            ) m 
            GROUP 
               BY patient_id
                , drugs
                , i
              ORDER BY patient_id, regimen;
    
    +------------+------------+------------+-------+---------+
    | patient_id | r_start    | r_end      | drugs | regimen |
    +------------+------------+------------+-------+---------+
    |        773 | 2013-11-28 | 2013-12-12 | X     |       1 |
    |        773 | 2013-12-20 | 2013-12-27 | X     |       2 |
    |        773 | 2013-12-28 | 2014-01-03 | X,Y   |       3 |
    |        773 | 2014-01-04 | 2014-01-06 | X     |       4 |
    +------------+------------+------------+-------+---------+