Search code examples
hbasehfile

Hbase bulk load HFiles periodically and minor compaction relation


I have scenario where we have to periodically load HFiles to HBase table on dialy basis.

HFile size for each run could be between 50 to 150 MB per region . These load could be 12 times a day as well as in some cases every 15 minutes.

While doing testing, I have observed that Minor compaction is not getting triggered even after having more than 3 files in region immediately. This may cause problem to have lots of files which is holding rows for same row key.

i have seen that compaction thread that is getting wake up after 10000 seconds (roughly 2 hours 45 minutes) are starting compaction and putting compaction task in Queue.

is there any configuration that can tells to trigger minor compaction as soon as 3 or more hFiles written by bulk load (completebulkload) irrespective of size of HFile ?

Hbase Version: HBase 1.1.2.2.6.5.4-1

Configuration:
   hbase.hstore.compaction.max = 10
   hbase.hstore.compactionThreshold = 3
   hbase.server.thread.wakefrequency = 10000

Solution

  • While looking on APIs, I have found that it is possible to asynchronously call minor or major compaction at hbase table level.

    There are HBase Admin API that can be used to call compaction based on need to avoid splitting if bulk load is pushing redundant data more frequently and using Constant region split size policy.

    Here is an example code to do this in Java:

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.Admin;
    import org.apache.hadoop.hbase.client.Connection;
    import org.apache.hadoop.hbase.client.ConnectionFactory;
    import org.apache.hadoop.hbase.client.Table;
    
    public class Compaction {
    
        private String tableName;
        private String compactionType;
        private Configuration conf;
        private Connection hbaseConnection;
        private String confFile = "/usr/hdp/current/hbase-client/conf/hbase-site.xml";
        private Admin admin;
        private Table table;
        private int sleepTime = 1 ;
    
        public Compaction(String tableName, String compactionType) {
            this.tableName = tableName;
            this.compactionType = compactionType;
    
        }
    
        private void initHBaseConnection() {
            this.conf = HBaseConfiguration.create();
            try {
                conf.addResource(new FileInputStream(new File(confFile )));
                hbaseConnection = ConnectionFactory.createConnection(conf);
                table = hbaseConnection.getTable(TableName.valueOf(tableName));
                admin = hbaseConnection.getAdmin();
    
            } catch (IOException e) {
                e.printStackTrace();
            }
    
        }
    
        public boolean perfom() throws InterruptedException {
    
    
              System.out.println("Performing action: Compact table " + tableName + ", compact type =" + compactionType);
              try {
                if (compactionType.equalsIgnoreCase("major")) {
                  admin.majorCompact(table.getName());
                } else {
                  admin.compact(table.getName());
                }
              } catch (Exception ex) {
                System.err.println("Compaction failed, might be caused by other chaos: " + ex.getMessage());
                return false;
              }
              if (sleepTime  > 0) {
                Thread.sleep(sleepTime);
              }
            return true;
        }
        public static void main(String[] args) throws InterruptedException {
            String tableName = args[0];
            String compactionType = args[1];
            Compaction compaction = new Compaction(tableName, compactionType);
            compaction.initHBaseConnection();
            compaction.perfom();
        }
    
    }