Search code examples
kettleiso8601datetimeoffset

Process ISO8601 timestamps with offsets in Pentaho Data Integration


In Pentaho's Data Integration (aka PDI and Kettle), I'd like to do some processing on dates and times that are in ISO 8601 format and have timezone offset information. For example:

2013-07-26T18:00:00-07:00

As of this writing, it's known that Kettle does not understand this date format (despite its standardization 26 years ago and at least one subsequent PSA).

How can I operate on this data without resorting to string parsing?


Solution

  • This took hours for me to figure out, so I'm posting Q&A style to hopefully save someone else the trouble.

    PDI ships with a transform step called User Defined Java Expression (UDJE), which can take in fields and output the results of Java expressions on those fields. It can also access the Joda-Time library. Joda eats ISO8601's for breakfast.

    The Java expression:

    org.joda.time.format.ISODateTimeFormat.dateTimeNoMillis().parseDateTime(dt8601)
    

    will take in a field dt8601 and leave you with a Joda DateTimeFormatter that can work with the date. In my case I wanted to convert to Unix time, so my full expression was:

    org.joda.time.format.ISODateTimeFormat.dateTimeNoMillis().parseDateTime(dt8601).getMillis() / 1000L
    

    For the other fields in the UDJE, "New field" is the field name you want to give to the results of this expression, and "Value type" must be Integer (which, in PDI, corresponds to Java's Long).

    Separately, a couple cautionary tales about approaches that did not work:

    • Using a "Modified Java Script Value" step. You'd think maybe you could use Date.parse, but all I got was NaN. There's a bunch of custom date/string conversion functions bundled with this step, but none handle timezones nor ISO8601s. I found the step's documentation lacking in clarity and examples and gave up.
    • Using javax.xml.bind.DatatypeConverter.parseDateTime(dt8601).getTime() for the UDJE expression, as suggested in a Pentaho forums post and an SO Q&A. This makes use of an XML DatatypeConverter and returns a Calendar object which you can further process. As the poster notes though, XML dates are only "inspired by" ISO8601. This will probably function, but I for one am tired of writing date processing code and want something that strictly conforms to the international standard.