Search code examples
hadoophbase

HBase : HFile stats not changed after flush


I have a HBase table 'emp'. I created some rows in it using hbase-shell, among which the biggest rowkey is 123456789. When I check on HBase UI (the web console) following the below path :

regions -> emp,,1582232348771.4f2d545621630d98353802540fbf8b00. -> hdfs://namenode:9000/hbase/data/default/emp/4f2d545621630d98353802540fbf8b00/personal data/15a04db0d3a44d2ca7e12ab05684c876 (store file) 

I can see Key of biggest row: 123456789, so everything is good.

But the problem came when I deleted the row containing the rowkey 123456789 using hbase-shell. I also put some other rows, then finally flush the table flush 'emp'.

I see a second HFile generated. But the Key of biggest row of the first HFile is always 123456789.

So I am very confused : this row no longer exist in my hbase table, and I already did a flush (so everything in memstore should be in HFile). Why in stats it always shows this rowkey ? What is going on behind the scene ? And how can I update the stats ?


Solution

  • You're correct in that everything in the memstore is now in HFiles, but until a compaction takes place the deleted row will still exist, albeit marked for deletion in the new, second HFile.

    If you force a compaction with major_compact ‘table_name’, ‘col_fam’, you should see this record disappear (and be left with one HFile). Maybe there's a small bug in stats that doesn't take deleted records into account?