Search code examples
phpdatetimetimestampjulian-date

Bug with converting "julian dates" to unixtimestamp and back to DateTime


today working on my project - found one weird bug. Using symfony/form (5.4.5) component and found that the date, passed as 0001-01-01 after transforming (on Symfony side) returning as 0000-12-30.

So I made some investigation (grab the code from symfony/form, which is doing this transforming stuff with dates and found one very interesting thing - dates, which are before 1582 year (moving from Julian to Gregorian calendar) transforming not correct.

Below I put an example of the code and its output.

    $dateFormat = 2;
    $timeFormat = -1;
    $timezone = new \DateTimeZone('UTC');
    $calendar = 1;
    $pattern = 'yyyy-MM-dd';

    $dateFormatter = new \IntlDateFormatter(\Locale::getDefault(), $dateFormat, $timeFormat, $timezone, $calendar, $pattern);

    $dates = ['0001-01-01', '0002-01-01', '1000-01-01', '1582-01-01', '1583-01-01', '1600-01-01', '1800-01-01'];
    foreach ($dates as $dateInput) {
        $timestamp = $dateFormatter->parse($dateInput);

        $dateTime = new \DateTime(date('Y-m-d', $timestamp), new \DateTimeZone('Europe/Berlin'));
        $dateOutput = $dateTime->format('Y-m-d');
        echo $dateInput . " " .$dateOutput . PHP_EOL;
    }
0001-01-01 0000-12-30 - BAD
0002-01-01 0001-12-30 - BAD
1000-01-01 1000-01-06 - BAD
1582-01-01 1582-01-11 - BAD
1583-01-01 1583-01-01 - GOOD
1600-01-01 1600-01-01 - GOOD
1800-01-01 1800-01-01 - GOOD

Solution

  • I would say this isn't exactly wrong, just a different interpretation, or a different choice of solution to a tricky problem: how do you count backwards in time past the beginning of the calendar you're using? The ultimate error you're seeing is the result of mixing libraries that choose different solutions, and so misinterpret each other's output.

    The first thing to know is that the functions you've used are built on two different libraries:

    • The IntlDateFormatter, as part of the PHP "intl" extension, is built on ICU (International Components for Unicode), maintained by the Unicode Consortium
    • The date function and DateTime class are built on timelib, maintained by Derick Rethans

    The next thing to notice, as you've already pointed out, is that they start disagreeing at the date when the Gregorian calendar was first introduced. We can be more specific than the year: it was introduced in October 1582, with the Julian 4th October being followed by the Gregorian 15th October. Importantly, this means that there are 10 dates which did not exist in those parts of the world which switched immediately.

    Finally, rather than formatting back and forth, let's look at the actual timestamps produced by the two libraries. Note that a Unix timestamp is supposed to represent a number of seconds after (or, in this case before) the moment of time that in the Gregorian calendar and UTC timezone is represented as 1970-01-01 00:00:00.

    To make the timestamps easier to read, let's divide them to show the number of days they say are between the given date and 1st Jan 1970.

    $dateFormatter = new \IntlDateFormatter(\Locale::getDefault(), 2, -1, new \DateTimeZone('UTC'), 1, 'yyyy-MM-dd');
    
    $dates = [
        '1582-10-03', '1582-10-04', '1582-10-05', '1582-10-06', '1582-10-07', '1582-10-08', '1582-10-09',
        '1582-10-10', '1582-10-11', '1582-10-12', '1582-10-13', '1582-10-14', '1582-10-15', '1582-10-16'
    ];
    foreach ($dates as $dateInput) {
        $icuTimestamp = $dateFormatter->parse($dateInput);
        $timelibTimestamp = DateTimeImmutable::createFromFormat('Y-m-d|', $dateInput)->getTimestamp();
        
        $icuDays = abs(intval( $icuTimestamp / 60 / 60 / 24 ));
        $timeLibDays = abs(intval( $timelibTimestamp / 60 / 60 / 24 ));
    
        echo "Is {$dateInput} {$icuDays} days or {$timeLibDays} days before 1970?\n";
    }
    

    The result looks like this:

    Is 1582-10-03 141429 days or 141439 days before 1970?
    Is 1582-10-04 141428 days or 141438 days before 1970?
    Is 1582-10-05 141427 days or 141437 days before 1970?
    Is 1582-10-06 141426 days or 141436 days before 1970?
    Is 1582-10-07 141425 days or 141435 days before 1970?
    Is 1582-10-08 141424 days or 141434 days before 1970?
    Is 1582-10-09 141423 days or 141433 days before 1970?
    Is 1582-10-10 141422 days or 141432 days before 1970?
    Is 1582-10-11 141421 days or 141431 days before 1970?
    Is 1582-10-12 141420 days or 141430 days before 1970?
    Is 1582-10-13 141419 days or 141429 days before 1970?
    Is 1582-10-14 141418 days or 141428 days before 1970?
    Is 1582-10-15 141427 days or 141427 days before 1970?
    Is 1582-10-16 141426 days or 141426 days before 1970?
    

    As expected, the two libraries agree that 1582-10-15 was 141427 days before 1970, but disagree about earlier dates:

    • timelib is using a "proleptic Gregorian calendar", assuming that 1582-10-14 was one day longer ago than 1582-10-15, and so on; this is historically inaccurate, but simpler to work with
    • ICU is attempting to interpret the dates as people in Europe at the time would, so that 1582-10-04 is only 1 day longer ago than 1582-10-15; that leaves some ambiguous dates in the middle which according to this interpretation should never have existed at all, which ICU is interpreting as Julian, with the odd effect that 1582-10-05 gives the same value as 1582-10-15

    The ICU documentation explains this cut-over behaviour, and how to influence it. Although not fully documented, PHP exposes the relevant methods, so it's possible to set the cut-over arbitrarily far in the past to match the timelib behaviour:

    $cal = IntlGregorianCalendar::createInstance();
    $cal->setGregorianChange(PHP_INT_MIN);
    $dateFormatter->setCalendar($cal);
    

    Add that to the previous code, and we get matching outputs:

    Is 1582-10-03 141439 days or 141439 days before 1970?
    Is 1582-10-04 141438 days or 141438 days before 1970?
    Is 1582-10-05 141437 days or 141437 days before 1970?
    Is 1582-10-06 141436 days or 141436 days before 1970?
    Is 1582-10-07 141435 days or 141435 days before 1970?
    Is 1582-10-08 141434 days or 141434 days before 1970?
    Is 1582-10-09 141433 days or 141433 days before 1970?
    Is 1582-10-10 141432 days or 141432 days before 1970?
    Is 1582-10-11 141431 days or 141431 days before 1970?
    Is 1582-10-12 141430 days or 141430 days before 1970?
    Is 1582-10-13 141429 days or 141429 days before 1970?
    Is 1582-10-14 141428 days or 141428 days before 1970?
    Is 1582-10-15 141427 days or 141427 days before 1970?
    Is 1582-10-16 141426 days or 141426 days before 1970?
    

    Note: I believe the appropriate bug in Symfony's tracker is #29610