Search code examples
sqldb2

Analyzing Gap Years in SQL


I have a table (myt) in SQL that looks like this:

CREATE TABLE myt (
    Name VARCHAR(50),
    Date_1 DATE,
    Date_2 DATE,
    Year_1 INT,
    Month_1 INT,
    Day_1 INT,
    Year_2 INT,
    Month_2 INT,
    Day_2 INT
);


INSERT INTO myt (ID, Name, Date_1, Date_2, Year_1, Month_1, Day_1, Year_2, Month_2, Day_2) VALUES
( 'p1', '2010-01-01', '2010-05-01', 2010, 1, 1, 2010, 5, 1),
( 'p2', '2010-01-01', '2013-09-09', 2010, 1, 1, 2013, 9, 9);


  Name     Date_1     Date_2 Year_1 Month_1 Day_1 Year_2 Month_2 Day_2
 p1 2010-01-01 2010-05-01   2010       1     1   2010       5     1
 p2 2010-01-01 2013-09-09   2010       1     1   2013       9     9

Task:

  • For each row, I want to find out how many feb-01 appear between date_1 and date_2.

For example:

  • P1: (date_1 = 2010-01-01, date_ 2= 2010-05-01 ) the answer is 1 (feb1 2010)

  • P2: (date_1 = 2010-01-01, date_2= 2013-09-09) the answer is 4 (feb1 2010 , feb1 2011, feb1 2012, feb1 2013)

I tried looking if there were specific SQL functions for this and didn't find any, so I tried to do this using a standardized approach:

SELECT *, 
       (year_2 - year_1 + 
       (CASE WHEN month_1 < 2 OR (month_1 = 2 AND day_1 <= 1) THEN 1 ELSE 0 END) - 
       (CASE WHEN month_2 < 2 OR (month_2 = 2 AND day_2 < 1) THEN 1 ELSE 0 END)) as feb1_count
FROM myt;

The results seem to look correct for these test cases:

Name     Date_1     Date_2 Year_1 Month_1 Day_1 Year_2 Month_2 Day_2 feb1_count
   P1 2010-01-01 2010-05-01   2010       1     1   2010       5     1          1
   P2 2010-01-01 2013-09-09   2010       1     1   2013       9     9          4

Is this the correct way to analyze this problem?


Solution

  • As long as the date you're counting isn't 29th February, your solution would work.

    There's a minute asymmetry in the calculation though, with the terms:

    (month_1 = 2 AND day_1 <= 1)
    (month_2 = 2 AND day_2 < 1)
    

    I think you can make the code symmetric (or maybe anti-symmetric) by adapting it to read:

    SELECT *, 
           (year_2 - year_1 - 1 + 
           (CASE WHEN month_1 < 2 OR (month_1 = 2 AND day_1 <= 1) THEN 1 ELSE 0 END) + 
           (CASE WHEN month_2 > 2 OR (month_2 = 2 AND day_2 >= 1) THEN 1 ELSE 0 END)) as feb1_count
    FROM myt;
    

    As long as date_1 <= date_2, that should work, even for edge cases:

    Date 1 Date 2 Y2-Y1-1 Case 1 Case 2 Answer Status
    2010-02-01 2010-02-01 -1 1 1 1
    2010-01-31 2010-02-02 -1 1 1 1
    2010-01-30 2010-01-31 -1 1 0 0
    2010-02-02 2010-02-02 -1 0 1 0
    2010-02-02 2011-01-31 0 0 0 0
    2010-02-02 2011-02-01 0 0 1 1
    2010-01-01 2013-09-09 2 1 1 4