using impala I noticed a deterioration in performance when I perform several times truncate and insert operations in internal tables. The question is: can refreshing the tables avoid the problem? So far I have used refresh only for external tables every time I copied files to hdfs to be loaded into the tables themselves.
Many thanks in advance! Moreno
You can use compute stats
instead of refresh
.
Refresh
is normally used when you add a data file or change something in table metadata - like add column or partition /change column etc. It quickly reloads the metadata. There is another related command invalidate metadata
but this is more expensive than refresh and will force impala to reload metadata when table is called in next query.
compute stats
- This is to compute stats of the table or columns when around 30% data changed. Its expensive operation but effective when you do frequent truncate and load.