Search code examples
databasehivehive-metastoretable-statistics

What is the implication of getting "No stats" for certain columns in hive?


I get the following in my log file, but don't know what is the implication:

query:

create table mydb.new as

select a.col1,b.col4,a.col3 from 

sampledb.table1 a
left join sampledb.table2 b
on a.col3=b.col3
;


No Stats for sampledb@table1, Columns: col_1
No Stats for sampledb@table2, Columns: col_4

Can someone help with this or guide me where to look?

Related question with no answers:Hive No Stats for 'database'@'table', Columns


Solution

  • No stats can result in suboptimal plan (for join columns) or table scan instead of using statistics only for simple queries:

    CBO uses statistics when creating query plan. Absence of statistics or stale statistics may result in sub-optimal query plan. If there is no stats, CBO makes some estimations based on files size/avg row size (see hive.stats.avg.row.size configuration and other statistics related settings).

    Also if hive.compute.query.using.stats=true and statistics exists, then optimizer is using statistics for simple query (for example select count(col1) ...) calculation instead of querying table data (this may lead to wrong query results if the stats is stale), see this answer.

    Also read this design document for more details about stats: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-StatisticsinHive