Search code examples
hbase

How does HBase calculate the flush size?


I am trying to better understand memstore flush algorithm in HBase.

I have a simple (snappy-compressed) table with 1 column family and I have configured HBase as follows (I have a couple of regions on this region server):

  • hbase.hregion.memstore.flush.size: 128 mib
  • Java Heap Size of HBase RegionServer in Bytes: 10 Gib
  • hbase.regionserver.global.memstore.upperLimit: 0.4
  • hbase.regionserver.global.memstore.size.lower.limit: 0.95

Based on the logs it seems like flushes are happening at 70mb mark what i see in the logs repeatedly is something similar to this

DefaultStoreFlusher Flushed memstore data size=68.14 MB at sequenceid=12561

Why not 128 mb?


Solution

  • Data size is sum of cell data alone (key bytes + value bytes). This is the actual data that will be flushed to Hfile. But heap usage for the same data is usually more. Along with cell's data, it includes the metadata and index. Flush happens when heap size reaches hbase.hregion.memstore.flush.size. Log might call that out.