I have about 10 thousand records (stored as ArrayList in Java). I want to insert these records to Impala.
Should I use insert into table partition values
to directly insert to impala. (I am not sure how many records can be inserted in one sql statement.)
Or should I write these records to HDFS then alter impala
table?
Which way is preferred? Or is there any other solutions?
And also if I do these in every 5 minutes, how can I avoid so many small files in one partition (partitioned by hour)? These will produce 12 small files in each partition, so will this affect the query speed?
The best you can do is to do:
I hope the answer serves you
Regards!