Search code examples
postgresqlself

How to achieve matching with same table in postgresql?


I want to count number of sites partition by country on the basis of below criteria:-

For each TrialSite with Trial_Start_Date = X and Site_Activated = Y, you should be counting all rows that meet these conditions:

  • Trial_Start_Date <= Y, AND
  • TrialSite_Activation_Date >= X i.e. all rows where there is some overlapping period with that row's Trial Start Date to TrialSite Activation Date.

sample data example:

Trial id  start_date  country  site_id trialSite_activation_date

Trial A    01-01-2017  India  site1 01-02-2017 ----> 2 (only overlaps with itself, and with 2nd row)

Trial A  01-01-2017  India  site 2 01-04-2017 ----> 4 (overlaps with all rows, including itself)

Trial B    02-03-2017 India  site3 01-04-2017 ----> 3 (does not overlap with first row, since Trial_Start_Date > 01-02-2017)

Trial B   02-03-2017 India  site4 01-04-2017 ----> 3

This data can contain multiple countries and this logic needs to be applied with same country records.


Solution

  • You could use the “overlaps” operator && on dateranges:

    SELECT t1, count(*)
    FROM trial t1
       JOIN trial t2
          ON daterange(t1.trial_start_date, t1.trialsite_activation_date, '[]')
             && daterange(t2.trial_start_date, t2.trialsite_activation_date, '[]')
    GROUP BY t1;
    
                          t1                       | count
    -----------------------------------------------+-------
     ("Trial A",2017-01-01,India,site1,2017-02-01) |     2
     ("Trial A",2017-01-01,India,site2,2017-04-01) |     4
     ("Trial B",2017-03-02,India,site3,2017-04-01) |     3
     ("Trial B",2017-03-02,India,site4,2017-04-01) |     3
    (4 rows)
    

    Instead of using the whole-row reference t1 in the SELECT list, you can specify individual columns there, but then you gave to list them in the GROUP BY clause as well.