Search code examples
csvtalend

Reading and splitting a CSV File with Talend


The csv file contains more than one table, it might look like this:

 "Table 1"
 ,
 "id","visits","downloads","emailsent"
  1, 4324, 23, 2
  2, 664, 42, 1
  3, 73, 44, 0
  4, 914, 8, 0
  ...

 "Table 2"
 ,
 "id_of_2nd_tab","visits_of_2nd_tab","downloads_of_2nd_tab"
  1, 524, 3
  2, 564, 52
  3, 63, 84
  4, 814, 8
  ...

What is the best way to import those tables into Talend ?


Solution

  • Generally, that kind of multi-record format CSV format is more complex to parse.

    Question : Are there are a finite number of tables? Question: Does each table have a fixed number and order of columns? Question: What is the separator between "tables" within the CSV?

    I believe you need to take a multi-pass approach. You could do something like this.

    Pass #1 - Use tFileInputDelimited Use row separator such as "Table", No field separator, Grab 1 big field Alternatively, you could split the first file into separate files at this stage.

    Pass #2 - Split Row (on results from pass #1) on the Row Separator "\r\n" etc Split it into multiple rows but of a single column.

    Pass #3 - Extract Delimited Fields (on results from pass #2) Extract based on a field separator Recognize a "Table" row Recognize a "Header row Additional handling per Table / set of fields in header