I have a table with an avro schema where I insert tweets periodically.
CREATE TABLE tweets
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs:///tweet_schema.avsc');
When I run
SELECT COUNT(*) FROM tweets;
I get only the count of the rows added by the last INSERT statement:
INSERT INTO TABLE tweets SELECT tweet FROM another_table;
Similarly, if I run
TRUNCATE TABLE tweets;
Querying for the count returns the table rows count before truncating, however, selecting everything returns nothing as expected.
Is this the expected behavior?
If it's indeed the expected behavior, what should I be doing to get the total number of rows?
I'm running Hive 0.13
It turned to be a bug, I reported it and got resolved in 0.14.0. You can watch the issue here HIVE-7213
A workaround is to:
set hive.compute.query.using.stats=false;