Search code examples
apachecsvapache-drillsqoop

Query Extensionless File using Apache Drill


I imported data in Hadoop using Sqoop 1.4.6. Sqoop imports and saves the data in HDFS in an extensionless file but in csv format. I used Apache Drill to query the data from this file but got Table not found error. In Storage Plugin configuration, I even put null, blank (""), space (" ") in extensions but was not able to query the file. Even I was able to query the file when I changed the filename with an extension. Putting any extension in the configuration file works other than null extension. I could query the file saved in csv format but with extension 'mat' or anything.

Is there any way to query the extensionless files?


Solution

  • You can use a default input format in the storage plugin configuration to solve this problem. For example:

    select * from dfs.`/Users/khahn/Downloads/csv_line_delimit.csv`;
    +-------------------------+
    |         columns         |
    +-------------------------+
    | ["hello","1","2","3!"]  |
     . . .
    

    Change the file name to remove the extension and modify the plugin config "location" and "defaultInputFormat":

    {
      "type": "file",
      "enabled": true,
      "connection": "file:///",
      "workspaces": {
        "root": {
          "location": "/Users/khahn/Downloads",
          "writable": false,
          "defaultInputFormat": "csv"
        },
    

    Query the file that has no extension.

    0: jdbc:drill:zk=local> select * from dfs.root.`csv_line_delimit`;
    +-------------------------+
    |         columns         |
    +-------------------------+
    | ["hello","1","2","3!"]  |
    . . .