Search code examples
talend

Talend tFileOutputdelimited component - problems with the split .csv files


I tried my luck on the Talend forum and no luck there, so I will try here as well.

I have a job that is reading a large table and then writing the data to .csv files in increments of 25000 rows. What I have noticed is that all .csv files created after the first .csv file have the data loaded all in one row versus the first .csv file that has the data loaded in 25000 rows (as I want it).

Is there a setting that needs to get set on the tFileOutputDelimited component that will allow for the rows in all subsequent .csv files to get loaded as they are in the first (and 'good') .csv file? I am thinking it may be due to what is being used for the 'Escape char' value on the 'Advance settings' tab but am not sure.

On the tFileOutputDelimited component's 'Basic settings' tab, the CSV Row Separator value is CRLF("\r\n") and the field separator is ",". On the component's 'Advanced settings' tab, the Escape char value is """ and the Text enclosure value also is """.

Also, this is being run in a Windows 7 environment.

Unfortunately the documentation I found for the tFileOutputDelimited component's 'Advance settings' tab is lacking in regards to the CSV options.

Below is an example of what is being encountered. As listed below, the first file looks great but all files that follow do not break on the line break and end up placing all of the data on one row versus individual rows.

File #1

header row row 1 row 2 row 3 ... row 25000

File #2...

header rowrow1row2...row25000

File #3...

header rowrow1row2...row25000

If you need more details, let me know and I'll send them right off. Thank you in advance.


Solution

  • Figured it out. As mentioned in my initial post, the CSV Row Separator had been set to the CRLF("\r\n") option. I changed this to the LF("\n") and that addressed the problem. I had looked atthe generated java code and noticed that it was not treating the CRLF("\r\n") as one of the default options - only \n and \r were. This pointed me in the direction of trying the \n option.