I have been working on a dataset business.json. I was extracting the required table to a .parquet files:
0: jdbc:drill:zk=local> use dfs.tmp;
0: jdbc:drill:zk=local> ALTER SESSION SET `store.format` = 'parquet';
After running my commands:
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 3221419 |
+-----------+----------------------------+
1 row selected (276.773 seconds)
I am getting partitioned .parquet files : 0_0_0.parquet, 0_0_1.parquet, 0_0_2.parquet
How do I get a single .parquet file : 0_0_0.parquet without any partitions?
Since since you have many rows Drill parallels execution. Consider adjusting the following config options [1]:
planner.slice_target
planner.width.max_per_node
planner.width.max_per_query
[1] https://drill.apache.org/docs/configuration-options-introduction/