Search code examples
mysql

MySql Leap Years


I have a MySql table called reviews which stores a lot of product reviews, accumulated over many years, each annotated with a date timestamp.
I wanted to draw a bar-chart showing me the number of reviews I have accumulated on each day of the year, hoping to find the important dates for my system.

I used this query

select dayofyear(date),count(*) from reviews group by dayofyear(date);

However, after noticing that this is returning 366 rows instead of 365, I have realized that I cannot use this for making this chart, because the day index is getting offset by 1 every leap year, which is skewing my data.

For example, Christmas is showing up as day #359 on most years, but its #360 on leap years.

What is the simplest way to resolve this skew?

On a sidenote, is there any software package that can accept an SQL query and return the results directly as a bar-chart (when a bar-chart makes sense)


Solution

  • I haven't tested this, but you could do something like the following:

    SELECT 
            # check if the year is a leap year:
            IF((YEAR(mydate) % 4 = 0 AND YEAR(mydate) % 100 != 0) OR YEAR(mydate) % 400 = 0,
                # if so, check if the date is before or after february 29th. if that is the case, we subtract one from the value
                IF(DAYOFYEAR(mydate) > DAYOFYEAR("2008-02-28"), DAYOFYEAR(mydate) - 1, DAYOFYEAR(mydate)),
                # if the year isn't a leap year, just return the regular dayofyear() value
                DAYOFYEAR(mydate))
        FROM mytbl
    

    This will merge the data for the 28th and 29th for leap years, but give the same offsets for the days during leap years and non-leap years for all other days. A more desirable behaviour could be to simply ignore the data from february 29th, which could be accomplished using additional conditions. You could also assign it a special index, such as 400, which won't offset all other days.

    A better way might be to group by month and day of month:

    select month(date) m, day(date) d, count(*) from reviews group by m, d;
    

    This avoids the problem all together, but might be more complicated for you to process in your application logic. Still, I'd argue that it's a lot better way to do it.