Search code examples
parquetapache-drill

Finding Parquet File Created with Apache Drill


After reading this post: http://tgrall.github.io/blog/2015/08/17/convert-csv-file-to-apache-parquet-dot-dot-dot-with-drill/

I'm trying to convert a CSV file to a Parquet file. I can successfully query my CSV:

select * from dfs.`/Users/[username]/Desktop/drill_example.csv` limit 5;

with an output of:

+-------------------+
|      columns      |
+-------------------+
| ["1","UT","M\r"]  |
| ["2","CA","M\r"]  |
| ["3","CA","F\r"]  |
| ["4","NJ","M\r"]  |
| ["5","FL","F\r"]  |
+-------------------+

I then change the format to Parquet via:

alter session set `store.format`='parquet';

with an output of:

+-------+------------------------+
|  ok   |        summary         |
+-------+------------------------+
| true  | store.format updated.  |
+-------+------------------------+

I then create the new table/file using this code:

CREATE TABLE dfs.tmp.`/Users/[username]/Desktop/drill_example_parquet` AS
select * from dfs.`/Users/[username]/Desktop/drill_example.csv`;

with the following output:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 10000                      |
+-----------+----------------------------+
1 row selected (1.292 seconds)

The table/file is created because I can query it with this code:

SELECT *
FROM dfs.tmp.`/Users/[username]/Desktop/drill_example_parquet`;

but I can't find the file on my computer. How do I get the Parquet file (not table)? In other words, the Parquet version of the CSV file on my desktop. Do I have to export it somehow? Also, how do I delete these tables once I'm done?

Thanks in advance.


Solution

  • Check your dfs plugin via web host (xx.xx.xx.xx:8047/storage/dfs)

    By default temp directory:

    "tmp": {
          "location": "/tmp",
          "writable": true,
          "defaultInputFormat": null
        }
    

    your file will be at location(assuming you have not chaned tmp directory) :

    /tmp/Users/[username]/Desktop/drill_example_parquet