Search code examples
etltalend

Talend workflow for importing multiple gzip file un-archiving and creating calculated field


I want to 1)read multiple gzip files from a path, 2)un-archive and 3)create a calculated field. So far i have been successful in doing 1 and 2. For 3, i thought that tMap will do the needful, however don't know why i am unable to connect the un-archive component with tMap.

enter image description here

Edit1: I don't know why tdelemited and tMap have the error message showing?

enter image description here

below is the message i got Starting job Migration_1 at 09:36 04/04/2017.

[statistics] connecting to socket on port 3336
[statistics] connected
[statistics] disconnected
Job Migration_1 ended at 09:36 04/04/2017. [exit code=0]

Edit2: i tried with all suggested steps, yet it does not give me the required output and to my surprise there is no error message in the log to debug anything.

enter image description here Starting job Migration_1 at 12:36 04/04/2017.

[statistics] connecting to socket on port 3463
[statistics] connected
[statistics] disconnected
Job Migration_1 ended at 12:36 04/04/2017. [exit code=0]

Solution

  • tFileUnarchive will just unarchive the zip files, but you will still have to read the files contained in these zips. tFileUnarchive component does not provide this reading part.

    After the tFileList-->tFileUnarchive subjob, you should have a file-reading subjob, such as :

    tFileList--iterate-->tFileInput*-->tMap
    

    tFileList should be set to read the repository where you extracted the gzip files.