How to deal with no escape commas in string fields when using `store.format`='csv' in drill table

When using drill script to convert a set of parquet files to csv, running into problem where some of the fields have commas in them. This causes a problem because drill does not appear to automatically add any escaping "<field>" or '<field>' characters around the fields in the converted files (eg. for string values that look like "Soft-drink, Large").

The script looks like

/opt/mapr/drill/drill-1.11.0/bin/sqlline \
        -u jdbc:drill:zk=node001:5181,node002:5181,node003:5181 \
        -n $(tail -n+1 $basedir/src/drill-creds.txt | head -1) \
        -p $(tail -n+2 $basedir/src/drill-creds.txt | head -1) \
        --run=$sqldir

where the sql being run by drill looks like

alter session set `store.format`='csv';
create table dfs.myworkspace.`/path/to/csv/destination` as 
select .... 
from dfs.myworkspace.`/path/to/origin/files`

Does anyone have any common ways they work around this? Is there a way to add escaping characters to the converted csv files (tried checking the docs (https://drill.apache.org/docs/create-table-as-ctas/), but could find nothing related)?

Solution

You can use TSV or PSV formats or to update the delimiter symbol for CSV cells by configuring Format Plugin of your File Storage Plugin:

In Drill UI choose Storage, then Update dfs storage plugin, find CSV in formats and add delimiter property with a desired value.

"csv": {
  "type": "text",
  "extensions": [
    "csv2"
  ],
  "skipFirstLine": false,
  "extractHeader": true,
  "delimiter": "^"
},

Also please check escape property

See more: https://drill.apache.org/docs/text-files-csv-tsv-psv/#use-a-distributed-file-system