Search code examples
pentahokettle

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?


I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception

 org.apache.commons.vfs.FileNotFolderException: 
 Could not list the contents of 
 "tar:gz:file:///C:/test/test.tar.gz!/test.tar!/" 
  because it is not a folder.

I use windows8.1, pdi 5.2
Where it might be wrong?


Solution

  • For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.

    Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)

    I have placed a sample code containing both reading zip and gzip files. Check it here.

    Hope it helps :)