Search code examples
talendcsv

Dealing With a Weird delimited data format in Talend or other tool?


So i have got a weird delimited format that i am not familiar with it's based on the output of a chat related application and the format is peculiar to me can anyone please enlighten me as to what this delimited format is if it's standard and any possible way to convert this to CSV with text quotations if possible.

"NumValue1|""TextValue2""|""TextValue3""|""TextValue"""

so my assumptions with this data format is there is a row "" the text qualifiers are "" text "" and the deliminator is |

also what is the value of delimiting in this format as apposed to say csv with text qualifiers? the text values don't seem to have " in them

Talend is my preferred tool but open to use anything to solve this problem.


Solution

  • I think this is a nested structure. I think the original data was a pipe delimited quote enclosed CSV file.

    NumValue1|"TextValue2"|"TextValue3"|"TextValue"

    Now they wanted to enclose this in quotes, but the original quotes needs to be handled. So they doubled that (common technique in SQL)

    My quick and dirty suggestion would be to create a workflow in talend that: tFileInputfullRow -> tJavaRow -> tFileOutputDelimited (by default OutputDelimited is buggy so it will leave your line intact at least in Talend 5 it was like that)

     row2.line = row1.line.substring(1,row1.line.length()-2).replace("\"\"","\"")
    

    Then you can do a tFileInputDelimited with | and "