Search code examples
javapostgresqljdbctimestamp

Jdbc persists wrong timestamp at 1900


I'm developing an app that store some data with timestamp in the very-past.

My JVM is IBM Semeru (17) running on Europe/Paris tz. But I want to store the timestamp at UTC (GMT+0).

Bellow is my code:

    <dependencies>
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.7.1</version>
        </dependency>
    </dependencies>
public class Main {

    /*
    create table theentity (
        theid integer not null,
        thevalue timestamp(6),
        primary key (theid)
    )
     */
    public static void main(String[] args) {
        TimeZone.setDefault( TimeZone.getTimeZone( ZoneId.of( "Europe/Paris" ) ) );
        LocalDateTime d_1900_01_01_T_00_09_23 = LocalDateTime.of( 1900, 1, 1, 0, 9, 23, 0 );
        LocalDateTime d_1900_01_01_T_00_09_22 = LocalDateTime.of( 1900, 1, 1, 0, 9, 22, 0 );
        LocalDateTime d_1900_01_01_T_00_09_21 = LocalDateTime.of( 1900, 1, 1, 0, 9, 21, 0 );
        LocalDateTime d_1900_01_01_T_00_09_20 = LocalDateTime.of( 1900, 1, 1, 0, 9, 20, 0 );
        LocalDateTime d_1900_01_01_T_00_09_19 = LocalDateTime.of( 1900, 1, 1, 0, 9, 19, 0 );

        try(Connection c = DriverManager.getConnection( "jdbc:postgresql://localhost:5432/hibernate_orm_test?preparedStatementCacheQueries=0&escapeSyntaxCallMode=callIfNoReturn",
                                                                                                        "postgres", "root")) {
            PreparedStatement p = c.prepareStatement( "insert into theentity values(?, ?)" );
            bindAndExecute( p, 1, d_1900_01_01_T_00_09_23 );
            bindAndExecute( p, 2, d_1900_01_01_T_00_09_22 );
            bindAndExecute( p, 3, d_1900_01_01_T_00_09_21 );
            bindAndExecute( p, 4, d_1900_01_01_T_00_09_20 );
            bindAndExecute( p, 5, d_1900_01_01_T_00_09_19 );
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void bindAndExecute(PreparedStatement p, int id, LocalDateTime localDateTime)
            throws SQLException {
        p.setInt( 1, id );
        p.setTimestamp(2,
                Timestamp.valueOf( localDateTime ),
                Calendar.getInstance( TimeZone.getTimeZone( ZoneId.of( "GMT" ) ) )
        );
        p.executeUpdate();
    }

}

In sort, i tried to persist 5 timestamps (note that these timestamps are based on Europe/Paris because of TimeZone.setDefault( TimeZone.getTimeZone( ZoneId.of( "Europe/Paris" ) ) );)

  • 1900-01-01T00:09:23
  • 1900-01-01T00:09:22
  • 1900-01-01T00:09:21
  • 1900-01-01T00:09:20
  • 1900-01-01T00:09:19

This line would translate Europe/Paris tz to UTC tz:

p.setTimestamp(2,
                Timestamp.valueOf( localDateTime ),
                Calendar.getInstance( TimeZone.getTimeZone( ZoneId.of( "GMT" ) ) ) );

Execute it would create 5 rows in postresql:

theid |      thevalue       
-------+---------------------
     1 | 1900-01-01 00:00:02
     2 | 1900-01-01 00:00:01
     3 | 1900-01-01 00:00:00
     4 | 1899-12-31 23:09:20
     5 | 1899-12-31 23:09:19
(5 rows)

Most of you might think Paris is at GMT+1. It is correct, but WAS NOT correct. At 1900, it was at GMT+00:09:21 (9 minutes, 21 sec)!!

So the first 3 timestamps were saved correctly to the table. But, the strange is at row 4. It is 1899-12-31 23:09:20 but I would expected it to be 1899-12-31 23:59:59. But seems that at 1899, the old DateTime API think it is at GMT+1. It caused row 4, 5 an so on... to be wrong!!

Can you guys explain this??


Solution

  • You asked something similar a week ago and I referred to the problem in a comment on that question.

    The underlying explanation - tzdata confusion and OpenJDK bugs.

    Pre 1970, OpenJDK timezones are completely broken. As in, none of it is guaranteed, and there is no fix on the horizon: The OpenJDK project is unwilling to understand how tzdata's policies have been changed or is unwilling to support pre-1970 proper times if they do understand what the 2022b update to the tzdata project means.

    What's tzdata?

    tzdata is a separate project used by loads of things, including linux kernels and JDKs; the project is a file with exact definitions of a great many timezones and historic data on how and when they changed over time.

    Given how difficult it is to historically determine precisely what a country's timezone was at any given moment in time in the distant past, tzdata's base set no longer guarantees anything if it's about pre-1970 timezone info. For, I guess, historical reasons it still ships a bunch of pre-1970 definitions and changes, but does not guarantee that these are correct. And indeed they often are not: Oversimplified, often known incorrect stuff. Some attempt to juggle simplicity and reduction of the size of the tz tables with being at least somewhat accurate at work, I presume.

    This isn't the fault of tzdata; the central tenet of this shift in how tzdata works is in essence correct: It is not possible to be highly certain of timezone data pre-1970; there are too few records that can be easily verified on the extremely limited budget of open source (tzdata is an open source project run more or less by one person; they can't fly to Paris and check some microfilm stuff in the national library or some such), and the varied opinions on what places even existed gets very muddy indeed. Should we have a Europe/Vichy? What about Prussia/Koningsberg (part of unified germany before the first world war, shifted hands a bit, and is now modern day Kaliningrad in Russia?)

    But, before 2022 or so, tzdata tried to do it all and be as accurate as possible. But, since then, no more: Instead, there is an extended tzdata file that is notably less 'verified' (i.e. if you attempt to file a bug against the extended data about some obscure few months where arguably, based on hard to verify evidence, a different timezone was used for a while, it might be ignored even if on balance it's more likely to be true than not, and certainly won't be picked up with high priority).

    But, that extended file tracks perfectly with what tzdata looked like in 2022.

    OpenJDK only has the base (verified only post 1970) tzdata file; it does not include the extended definitions.

    Hence, the bug: Europe/Paris pre 1970 is likely incorrect. I know for a fact that Europe/Amsterdam in the 1938-1945 range is incorrect.

    What does this look like

    With a JDK based on tzdata 2022a release or earlier, this code:

    // Warning: Ordinarily, don't use the `Calendar` API,
    // it's bad. But to reproduce this bug it's slightly easier.
    // Using `java.time` (or `java.util.Date`) does not make it go away.
    
            var gc = new GregorianCalendar(TimeZone.getTimeZone("Europe/Amsterdam"));
            gc.set(Calendar.YEAR, 1937);
            gc.set(Calendar.MONTH, Calendar.AUGUST);
            gc.set(Calendar.DAY_OF_MONTH, 2);
            gc.set(Calendar.HOUR_OF_DAY, 0);
            gc.set(Calendar.MINUTE, 0);
            gc.set(Calendar.SECOND, 0);
            gc.set(Calendar.MILLISECOND,0);
            System.out.println(gc.getTimeInMillis());
    

    This prints -1022980800000L on 'correct' JDKs (at this point, pretty old ones, built before 2022). That is as far as history buffs all agree, the correct time.

    But on modern JDKs, it doesn't print that; not anymore. Ever since JDKs integrated tzdata 2022-b (the first release with simplified pre-1970 stuff), you get a different and wrong number.

    The effect of this on your software depends on how you use dates, or how your deps use it.

    As a specific example, H2 (the database engine) will mess up completely. If you store a date, even if you do so with the highly recommended strategy of writing with preparedStatement.setObject(LocalDate.of(1937, 8, 2)) and reading with the recommended resultSet.getObject(colIdx, LocalDate.class) (which the JDBC spec guarantees will work on any compliant JDBC impl), and date you wrote on a pre-2022b JDK and read back on a post-2022b JDK will respond with a date that's one off.

    And 1937 is an entirely plausible year to be born in. Birthdates are used as keys in many circumstances (especially medical) so this breaks everything.

    Fixing it - step 1

    You need to do two things to fix your JDK. Step 1 is to fix the tzdata files:

    There's the tzupdater tool from oracle which lets you rewrite the tzdata files used by a JDK. Download it and use it as follows on a posix system:

    sudo java -jar tzupdater.jar -v -f -l https://www.iana.org/time-zones/repository/releases/tzdata2022a.tar.gz
    

    Note that this means any tz updates since 2022 (hey, political entities sometimes decree timezone changes, it happens) are removed by switching to this def; 2022a is the last time the set was 'pre-1970 best effort'. I don't have a solution if you also need any post-2022a changes at this time.

    This will update the JDK installation that is providing the java executable you run that command with. Naturally, you need admin access as the JDK will need to rewrite its own files to do this.

    Step 2 - Be very careful with date to millis transitions

    A proper DB design will store a LocalDate as its component fields: A date of 1937-08-02 should be stored exactly like that: Store those 3 numbers. Use some bit twiddling to shove them into as few bits as you can muster if you want, but, that's unfortunately not what H2 v1 does; instead it uses some decreed timezone to convert it to the epoch-millis on that date at midnight, stores that, and when reading that data out, the same aspect is done in reverse. That's why H2 returned the wrong date even if you use LocalDate both 'in' and 'out'.

    psql, for example, wouldn't have done that and returns a stable date even if you 'write' the data on a JDK with 2022a and 'read' it on 2022b or higher.

    However, very important: java.sql.Timestamp and especially java.sql.Date are totally broken and you must not use them. Unfortunately many abstractions such as JPA impls often do, I don't know how to fix that. The problem is that those 2 types extends java.util.Date and that class is broken (that's why it is deprecated): It's a lie. It does not represent dates, it represents instants. That's why all the date-related methods (such as date.getYear()) are deprecated: Because the answer they give can be wrong and that cannot be fixed. And because java.sql.Date and java.sql.Timestamp extend this broken class, they are broken too.

    The ONLY way to do dates right with databases is to pray your DB stores it properly (which H2 v1 unfortunately does not), and that you use the right way to relay date/time data into and out of the DB:

    to write:

    preparedStatement.setObject(qIdx, instanceOfProperType);
    

    where you can pass any instance of type LocalDate, LocalDateTime, Instant, or OffsetDateTime. Unfortunately, the date/time type that is most often the most appropriate thing is ZonedDateTime and this isn't supported by any DB I know of, or isn't supported properly (they don't know how to bittwiddle the rather wordy Europe/Paris into a few bits, I guess, it makes some sense they find this difficult to store).

    To read:

    LocalDate x = resultset.getObject(colIdxOrName, LocalDate.class);
    

    Where you can also use Instant or LocalDateTime, or OffsetDateTime instead of LocalDate.

    Why isn't there a .getLocalDate like there's getDate?

    Because the JDBC panel decided to stop adding a getType() method for every new type that comes up as baseline DB <-> java type; no such methods will ever be made again. getObject is it. Instead the JDBC spec has started decreeing certain types as 'must be supported'. LocalDate and the other types named above are on the list. ZonedDateTime is not.

    Naturally, you yourself must also never convert 'human reckoning' (time stated in terms of years, months, hours, weeks, that sort of thing) to 'computer reckoning' (epoch millis). Or, at least, never convert only to convert back again later, because that operation is not reversible even though you'd think it would be. "Convert this year/month/day hour:minute:second value in Paris to epochmillis.. and then tomorrow I'm going to ask you to convert that back" is not actually guaranteed to give you the same value. So, don't do it. Store your date/time stuff exactly as it is. Do not think Instant+TimeZone is perfectly convertible back-and-forth to a ZonedDateTime instance. Because tzdata can change.

    Beware auto-updates!

    If your server platform updates its JDK version, your tzdata update is overwritten.

    I suggest you run this code on boot of your java app and hard exit immediately if you detect the tzdata isn't up to date; actually running at this point will mean your app starts corrupting your DB and you don't want that:

        private void checkTimezoneBackzoneIssue() {
            var gc = new GregorianCalendar(TimeZone.getTimeZone("Europe/Amsterdam"));
            gc.set(Calendar.YEAR, 1937);
            gc.set(Calendar.MONTH, Calendar.AUGUST);
            gc.set(Calendar.DAY_OF_MONTH, 2);
            gc.set(Calendar.HOUR_OF_DAY, 0);
            gc.set(Calendar.MINUTE, 0);
            gc.set(Calendar.SECOND, 0);
            gc.set(Calendar.MILLISECOND,0);
            if (gc.getTimeInMillis() != -1022980800000L) {
                System.err.println("Timezone data is broken: It no longer contains pre-1970 info (the 'backzone'), which breaks Europe/Amsterdam particularly badly; all ");
                System.err.println("dates prior to May 20th 1940 will be read back wrong, this affects birthdays stored in H2 databases, for example.");
                System.err.println("The JVM will now be shut down to avoid data corruption.");
                System.err.println("---");
                System.err.println("You can 'fix' a JDK using https://www.oracle.com/java/technologies/downloads/tools/#TZUpdater, running java -jar tzupdater.jar -v -f -l https://www.iana.org/time-zones/repository/releases/tzdata2022a.tar.gz");
                System.err.println("Your java is at: " + System.getProperty("java.home"));
                System.exit(18);
            }
        }