Search code examples
sqlpostgresqldate-arithmetic

PostgreSQL age() function: different/unexpected results when landing in dfferent month


Today, I've encountered an unexplainable result in PostgreSQL 9.6 while running this query:

SELECT age('2018-06-30','2018-05-19') AS one,
       age('2018-07-01','2018-05-20') AS two; 

Expected results for both columns: 1 mon 11 days. However, only for the interval from 2018-05-19 to 2018-06-30, I get what I expect, while for 2018-05-20 till 2018-07-01 I'll get one day more: 1 mon 12 days

I don't get why this is the case and in my understanding, between 2018-05-20 2018-07-01 is just an interval of 1 mon 11 days and the Postgres result here is wrong.

I cannot find any in-depth information on how exactly the PostgreSQL-age(timestamp,timestamp) function works. However, I assumed that function does something like: Go from the start date in month steps forward till you reach the end month. From there, go to the day of the end date. Sum up months and days.

So, in my understanding, this is what should go on under the hood in my case (sorry, for being so verbose here, but I feel it's necessary):

Start at 2018-05-19. Go one month forward. Land at 2018-06-19. Walk N days forward till you've reached 2018-06-30:

1 day: 20
2 days: 21
3 days: 22
4 days: 23
5 days: 24
6 days: 25
7 days: 26
8 days: 27
9 days: 28
10 days: 29
11 days: 30

= 1 month 11 days.

For the time between 2018-05-20 and 2018-07-01 it should be almost the same:

Start at 2018-05-20. Go one month forward. Land at 2018-06-20. Walk N days forward till you've reached 2018-07-01:

1 day: 21
2 days: 22
3 days: 23
4 days: 24
5 days: 25
6 days: 26
7 days: 27
8 days: 28
9 days: 29
10 days: 30
11 days: 1

= 1 month 11 days.

Is this my mistake or one of PostgreSQL? Are there alternative functions/algorithms which work the way I described/expect?


Solution

  • age is calculated by the timestamptz_age function in src/backend/utils/adt/timestamp.c. The comment says:

    /* timestamptz_age()
     * Calculate time difference while retaining year/month fields.
     * Note that this does not result in an accurate absolute time span
     *  since year and month are out of context once the arithmetic
     *  is done.
     */
    

    The code first converts the arguments to struct pg_tm variables tm1 and tm2 (struct pg_tm is similar to the C library's struct tm, but has additional time zone fields) and then calculates the difference tm per field.

    In the case of age('2018-07-01','2018-05-20'), the relevant fields of that difference would look like this:

    tm_mday = -19
    tm_mon  =   2
    tm_year =   0
    

    Now negative fields are adjusted. for tm_mday, the code looks like this:

    while (tm->tm_mday < 0)
    {
        if (dt1 < dt2)
        {
            tm->tm_mday += day_tab[isleap(tm1->tm_year)][tm1->tm_mon - 1];
            tm->tm_mon--;
        }
        else
        {
            tm->tm_mday += day_tab[isleap(tm2->tm_year)][tm2->tm_mon - 1];
            tm->tm_mon--;
        }
    }
    

    Since dt1 > dt2, the else branch is taken, and the code adds the number of days in May (31) and reduces the month by 1, ending up with

    tm_mday = 12
    tm_mon  =  1
    tm_year =  0
    

    That is the result you get.

    Now at first glance it seems that tm2->tm_mon isn't the right month to choose, and it would have been better to take the previous month of the left argument:

    day_tab[isleap(tm1->tm_year)][(tm1->tm_mon + 10) % 12]
    

    But I cannot say if that choice would be better in all cases, and in any event the comment indemnifies the function, so I'd hesitate to call it a bug.

    You might want to take it up with the hackers mailing list.