Search code examples
hadoophbaseregion

Performance Issue when Single row in Hbase exceeds hbase.hregion.max.filesize


In Hbase, I have configured hbase.hregion.max.filesize as 10GB. If the Single row exceeds the 10GB size, then the row will not into 2 regions as Hbase splits are done based on row key

For example, if I have a row which has 1000 columns, and each column varies between 25MB to 40 MB. So there is chance to exceed the defined region size. If this is the case, how will it affect the performance while reading data using rowkey alone or row-key with column qualifier?


Solution

  • First thing is Hbase is NOT for storing that much big data 10GB in a single row(its quite hypothetical).

    I hope your have not saved 10GB in a single row (just thinking of saving that)

    It will adversely affect the performance. You consider other ways like storing this much data in hdfs in a partitioned structure.

    In general, these are the tips for generally applicable batch clients like Mapreduce Hbase jobs

    Scan scan = new Scan();
    scan.setCaching(500); //1 is the default in Scan, which will be bad for MapReduce jobs
    scan.setCacheBlocks(false);  // don't set to true for MR jobs
    

    Can have look at Performance