Search code examples
refreshimpala

Impala: when refresh tables?


using impala I noticed a deterioration in performance when I perform several times truncate and insert operations in internal tables. The question is: can refreshing the tables avoid the problem? So far I have used refresh only for external tables every time I copied files to hdfs to be loaded into the tables themselves.

Many thanks in advance! Moreno


Solution

  • You can use compute stats instead of refresh.

    Refresh is normally used when you add a data file or change something in table metadata - like add column or partition /change column etc. It quickly reloads the metadata. There is another related command invalidate metadata but this is more expensive than refresh and will force impala to reload metadata when table is called in next query.

    compute stats - This is to compute stats of the table or columns when around 30% data changed. Its expensive operation but effective when you do frequent truncate and load.