Search code examples
jsonapache-drill

Apache Drill JSON storage configuration error(invalid json mapping)


I try to change the storage configuration in apache drill in embedded mode to identify headers and to change the delimiter of csv files. I also renamed the new format category from csv to sap.

I tried to use the information from the documentation and created the following json storage information:

{
 "type": "file",
 "enabled": true,
 "connection": "file:///",
 "workspaces": {
  "root": {
    "location": "/",
    "writable": false,
    "defaultInputFormat": null
  },
  "tmp": {
  "location": "/tmp",
  "writable": true,
  "defaultInputFormat": null
  }
},
  "formats": {
    "sap": {
     "type": "text",
     "extensions": [
       "sap"
     ],
     "skipFirstLine": false,
     "extractHeader": true,
     "delimiter": "|"
   },
   "psv": {
    "type": "text",
    "extensions": [
       "tbl"
    ],
    "delimiter": "|"
   },
   "csv": {
     "type": "text",
     "extensions": [
       "csv"
     ],
   "delimiter": ","
   },
   "tsv": {
     "type": "text",
     "extensions": [
       "tsv"
      ],
     "delimiter": "\t"
   },
   "parquet": {
      "type": "parquet"
   },
   "json": {
     "type": "json"
   },
   "avro": {
      "type": "avro"
   }
}
}

But always when I try to save it in the web-ui I got the message: error (invalid json mapping).

The exec.storage.enable_new_text_reader is set true.

Could somebody help my how I can add the two config items: skipFirstLine and extractHeader?

BR


Solution

  • Drill is able to parse the header row in a text file (CSV, TSV, etc.) in Drill 1.3. Check documentation for this.

    Check Release notes for Dill 1.3 and csv header parsing issue for more details.