I try to change the storage configuration in apache drill in embedded mode to identify headers and to change the delimiter of csv files. I also renamed the new format category from csv to sap.
I tried to use the information from the documentation and created the following json storage information:
{
"type": "file",
"enabled": true,
"connection": "file:///",
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"sap": {
"type": "text",
"extensions": [
"sap"
],
"skipFirstLine": false,
"extractHeader": true,
"delimiter": "|"
},
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
}
}
}
But always when I try to save it in the web-ui I got the message: error (invalid json mapping).
The exec.storage.enable_new_text_reader is set true.
Could somebody help my how I can add the two config items: skipFirstLine and extractHeader?
BR
Drill is able to parse the header row in a text file (CSV, TSV, etc.) in Drill 1.3. Check documentation for this.
Check Release notes for Dill 1.3 and csv header parsing issue for more details.