Search code examples
mapreducehivehiveqlhadoop2

hive aggregate query takes wrong value from cache


I am running aggregate query on hive session.

hive>select count(1) from table_name;

For the first time it runs mapreduce program and returns result. But for the consecutive runs later in the day it returns same count from the cache(though table is getting updated hourly). which is wrong count.

tried:-

set hive.metastore.aggregate.stats.cache.enabled=false

hive.cache.expr.evaluation=false

set hive.fetch.task.conversion=none

But no luck. Using Hive 1.2.1.2.3.4.29-5 hive version. Thanks


Solution

  • Disable using stats for query calculation:

    set hive.compute.query.using.stats=false;
    

    See also this answer for more details: https://stackoverflow.com/a/41021682/2700344