Search code examples
datetalend

Date Column Split in Talend


So I have one big file (13 million rows) and date formatted as:
2009-04-08T01:57:47Z. Now I would like to split it into 2 columns now,
one with just date as dd-MM-yyyy and other with time only hh:MM.

How do I do it?


Solution

  • You can simply use tMap and parseDate/formatDate to do what you want. It is neither necessary nor recommended to implement your own date parsing logic with regexes.

    First of all, parse the timestamp using the format yyyy-MM-dd'T'HH:mm:ss'Z'. Then you can use the parsed Date to output the formatted date and time information you want:

    • dd-MM-yyyy for the date
    • HH:mm for the time (Note: you mixed up the case in your question, MM stands for the month)

    If you put that logic into a tMap:

    Using tMap to split a timestamp into date and time columns

    you will get the following:

    • Input:
      • timestamp 2009-04-08T01:57:47Z
    • Output:
      • date 08-04-2009
      • time 01:57

    NOTE

    Note that when you parse the timestamp with the mentioned format string (yyyy-MM-dd'T'HH:mm:ss'Z'), the time zone information is not parsed (having 'Z' as a literal). Since many applications do not properly set the time zone information anyway but always use 'Z' instead, so this can be safely ignored in most cases.

    If you need proper time zone handling and by any chance are able to use Java 7, you may use yyyy-MM-dd'T'HH:mm:ssXXX instead to parse your timestamp.