Search code examples
java-8timezonejava-timedatetime-parsingzoneddatetime

Parsing a string into a ZonedDateTime with a DateTimeFormatter


I'm trying to parse this String into a ZonedDateTime:

"Mon 14 Aug 2017 02:00 AM CEST"

Here my last try:

System.out.println("Test ZonedDateTime: " + ZonedDateTime.parse(
            "Mon 14 Aug 2017 02:00 AM CEST",
            DateTimeFormatter.ofPattern("EEEE dd M yyyy KK:mm a z")));

And the response:

Exception in thread "main" java.time.format.DateTimeParseException: Text 'Mon 14 Aug 2017 02:00 AM CEST' could not be parsed at index 0
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
at java.time.ZonedDateTime.parse(ZonedDateTime.java:597)
at be.hypertux.test.localtime.Main.main(Main.java:17)

Any ideas?


Solution

  • One problem is that short timezone names like CEST and CET are ambiguous and not standard. The ideal is to use IANA timezones names (always in the format Continent/City, like America/Sao_Paulo or Europe/Berlin).

    I'm assuming that CEST is the Central Europe Summer Time, which is used by lots of different countries (that's why it's ambiguous: you can't know which country or region it is, because it's a too broad range).

    Although most abbreviations are not recognized (due to its ambiguity), some "defaults" are assumed for retro-compatibility reasons. In the version I'm using (JDK 1.8.0_131), it defaults to Europe/Paris, but not sure if that's what you need. And it's not guaranteed to work for all abbreviations. In this case, you can define a preferred timezone to be used (and that will an arbitrary choice, but there's no other way since CEST is ambiguous).

    Another problem is that the month and day of week are in English (Aug and Mon), and you didn't specify a java.util.Locale. In this case, the DateTimeFormatter takes the system's default locale (and it's probably not English - check the value of Locale.getDefault()). Anyway, the default locale can be changed without notice, even at runtime, so it's better to specify one when you're dealing with localized data (like month and day of week names).

    So, you must specify a locale and define an arbitrary timezone as the preferred one to be used when an ambiguous name like CEST is found. For that, you can use a java.time.format.DateTimeFormatterBuilder, a set of preferred timezones and a java.time.format.TextStyle:

    // create set of preferred timezones
    Set<ZoneId> zones = new HashSet<>();
    // my arbitrary choice for CEST
    zones.add(ZoneId.of("Europe/Brussels"));
    DateTimeFormatter formatter = new DateTimeFormatterBuilder()
        // date and time
        .appendPattern("EEE dd MMM yyyy hh:mm a ")
        // timezone short name with custom set of preferred zones
        .appendZoneText(TextStyle.SHORT, zones)
        // create formatter (use English locale for month and day of week)
        .toFormatter(Locale.ENGLISH);
    
    String input = "Mon 14 Aug 2017 02:00 AM CEST";
    System.out.println(ZonedDateTime.parse(input, formatter));
    

    The output will be:

    2017-08-14T02:00+02:00[Europe/Brussels]

    Note that I used Europe/Brussels as the preferred timezone. You can check all the available zone names (and choose accordingly) with ZoneId.getAvailableZoneIds().


    I'm using hh for the hours, which is the hour-clock-of-am-pm field (values from 1 to 12). But in your code you used KK, which is the hour-of-am-pm field (values from 0 to 11). Check which one is best for your case.


    A timezone is the set of all different offsets that a region had, has and will have during its history, and the dates when Daylight Saving Time starts and ends, etc. If 2 regions had some difference in this history, they'll have different timezones (even though they use the same rules today).

    Just because Paris and Brussels use the same rules today (CET and CEST), it doesn't mean it'll be like this forever (because timezones rules are defined by governments and laws and there's no guarantee that they won't be changed at any time in the future).

    That's why you must define some specific timezone instead of relying on ambiguous short names (even though their use is common and widespread).