Search code examples
sqlparquetpartitionapache-drill

Convert .json dataset to .parquet without any partitions in Apache Drill


I have been working on a dataset business.json. I was extracting the required table to a .parquet files:

0: jdbc:drill:zk=local> use dfs.tmp;
0: jdbc:drill:zk=local> ALTER SESSION SET `store.format` = 'parquet';

After running my commands:

+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 3221419                    |
+-----------+----------------------------+
1 row selected (276.773 seconds)

I am getting partitioned .parquet files : 0_0_0.parquet, 0_0_1.parquet, 0_0_2.parquet

How do I get a single .parquet file : 0_0_0.parquet without any partitions?


Solution

  • Since since you have many rows Drill parallels execution. Consider adjusting the following config options [1]:

    planner.slice_target
    planner.width.max_per_node
    planner.width.max_per_query
    

    [1] https://drill.apache.org/docs/configuration-options-introduction/