Search code examples
hadoopimpala

Why Impala generates multiple files for one insert statement


It is supposed that only one file should be generated for a single "insert...select" statement, while 20 files are generated in my case. How can I reduce the result into one single file?


Solution

  • If the files are small, you can use SET NUM_NODES=1 to force all of the data through a single node, as noted in the documentation.