The hbase writes the record updates (for a row key RK1) to Hfile. However one of the older Hfile will contain references to this rowkey RK1. How and when is this older reference to this RK1 invalidated ?
Assume there is Hfile containing the record for rowkey RK1. Then this RK1 is updated which means this update is written to a new HFile. The older Hfile containing reference the RK1 must be invalidated. How and when is this done in Hbase ?
Thanks.
In HDFS files are immutable objects, so both files old and new will be keep a reference RK1. Not to keep a large amount of HFile in HDFS, HBase periodically does a compaction job: mergers old small HFiles into new big one and delete old small HFile. Reference to RK1 will be in HFile until the compaction with files is happened. There are no guaranty for this, during a minor compaction, that running only on several HFiles. Major compaction mergers all files. To enforce the old values deletion, you should trigger a major compaction. Be careful with major compaction, for huge table it runs for hours.