Search code examples
pentahopentaho-spoonpentaho-data-integration

How to escape double quotes (") while reading data in Pentaho


I have a csv source file with comma (,) delimiter and values are enclosed with double quotes (") and using Text file Input to read the data in PDI 8.3. I am using , in Separator and " in Enclosure options in Content tab.

However, there is a field that contains quotes within the double quotes in the values itself, see the example below:

"abc","cde", "abc" - 1st col "cde" - 2nd col "ef"A"gh" - 3rd col "ijk" - 4th col and so on..

And issue in the 3rd col, in output it's reading "ef" as 3rd col and remaining values is passing to the next subsequent col. Hope I am able to clarify the issue here, only Expecting to escape the " within the values.

I have tried " in the Escape option but it's not working. Can someone please suggest how to handle this.

Thanks!


Solution

  • As @Dialvive says you can leave the enclosure empty on the content page of the CSV file input or Text file input. However, keep in mind the enclosure is there for a reason:

    • If any of the entries might contain a seperator as part of the value used in a field, then this will go wrong, since the seperator is no longer protected by enclosure symbols.
    • When using this sollution be sure to add another string operation step (multiple options) afterwards deleting any leading or trailing double quotes, to clean the ignored enclosures manually.

    If the seperator is part of at least one of the datafields in the data file, then please let your data provider know to either:

    1. use a more unique seperator, or
    2. to prohibit the use of the enclosure symbol in the text, or
    3. to include an escape symbol in the text when using the enclosure symbol. (which also can be set on the content tab of the csv file input step).

    For example the line ["I said: "Hello World!""] will then be with a slash escape ["I said: /"Hello World!/""] which can be handled fine with setting up enclosure as "and escape as /