Search code examples
sqlt-sqlsubquerywindow-functions

Summing up only the values of previous rows with the same ID


As I am preparing my data for predicting no-shows at a hospital, I ran into the following problem: In the query below I tried to get the number of shows/no-shows relatively shown to the number of appointments (APPTS). INDICATION_NO_SHOW means whether a patient showed up at a appointment. 0 means show, and 1 means no-show.

with t1 as 
(
select
    PAT_ID                                                              
    ,APPT_TIME
    ,APPT_ID                                                                
    ,ROW_NUMBER () over(PARTITION BY PAT_ID order by pat_id,APPT_TIME) as [TOTAL_APPTS] 
    ,INDICATION_NO_SHOW
from appointments
)
,
t2 as
(
 t1.PAT_ID
 ,t1.APPT_TIME
 ,INDICATION_NO_SHOW
 ,sum(INDICATION_NO_SHOW) over(order by PAT_ID, APPT_TIME ) as TOTAL_NO_SHOWS
 ,TOTAL_APPT
 from t1
 )
  SELECT *
  ,(TOTAL_APPT- TOTAL_NO_SHOWS) AS TOTAL_SHOWS
  FROM T2
  order by PAT_ID, APPT_TIME

This resulted into the following dataset:

 PAT ID  APPT_TIME INDICATION_NO_SHOW  TOTAL_SHOWS TOTAL_NO_SHOWS  TOTAL_APPTS 
 1        1-1-2001     0                     1            0              1       
 1        1-2-2001     0                     2            0              2       
 1        1-3-2001     1                     2            1              3       
 1        1-4-2001     0                     3            1              4      
 2        1-1-2001     0                     0            1              1       
 2        2-1-2001     0                     1            1              2       
 2        2-2-2001     1                     1            2              3       
 2        2-3-2001     0                     2            2              4       

As you can see my query only worked for patient 1, and then it also counts the no-shows for patient 1 for patient 2. So individually it worked for 1 patient, but not over the whole dataset.

The TOTAL_APPTs column worked out, because it counted the number of appts the patient had at the moment of that given appt. My question is: How do I succesfully get these shows and no-shows succesfully added up (as I did for patient 1)? I'm completely aware why this query doesn't work, I'm just completely in the blue on how to fix it..


Solution

  • I think that you can just use window functions. You seem to be looking for window sums of shows and no shows per patient, so:

    select
        pat_id,
        appt_time,
        indication_no_show,
        sum(1 - indication_no_show) 
            over(partition by pat_id order by appt_time) total_shows,
        sum(indication_no_show) 
            over(partition by pat_id order by appt_time) total_no_shows
    from appointments