Search code examples
javahbasejruby

How to mass delete multiple rows in hbase?


I have the following rows with these keys in hbase table "mytable"

user_1
user_2
user_3
...
user_9999999

I want to use the Hbase shell to delete rows from:

user_500 to user_900

I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?

I see here:

https://github.com/apache/hbase/blob/master/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?

   Table ht = TEST_UTIL.getConnection().getTable("my_table");
    long noOfDeletedRows = 0L;
    Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =
      new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {
      ServerRpcController controller = new ServerRpcController();
      BlockingRpcCallback<BulkDeleteResponse> rpcCallback =
        new BlockingRpcCallback<BulkDeleteResponse>();

      public BulkDeleteResponse call(BulkDeleteService service) throws IOException {
        Builder builder = BulkDeleteRequest.newBuilder();
        builder.setScan(ProtobufUtil.toScan(scan));
        builder.setDeleteType(deleteType);
        builder.setRowBatchSize(rowBatchSize);
        if (timeStamp != null) {
          builder.setTimestamp(timeStamp);
        }
        service.delete(controller, builder.build(), rpcCallback);
        return rpcCallback.get();
      }
    };
    Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan
        .getStartRow(), scan.getStopRow(), callable);
    for (BulkDeleteResponse response : result.values()) {
      noOfDeletedRows += response.getRowsDeleted();
    }
    ht.close();

If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.


Solution

  • Do you really want to do it in shell because there are various other better ways. One way is using the native java API

    • Construct an array list of deletes
    • pass this array list to Table.delete method

    Method 1: if you already know the range of keys.

    public void massDelete(byte[] tableName) throws IOException {
        HTable table=(HTable)hbasePool.getTable(tableName);
    
        String tablePrefix = "user_";
        int startRange = 500;
        int endRange = 999;
    
        List<Delete> listOfBatchDelete = new ArrayList<Delete>();
    
        for(int i=startRange;i<=endRange;i++){
            String key = tablePrefix+i; 
            Delete d=new Delete(Bytes.toBytes(key));
            listOfBatchDelete.add(d);  
        }
    
        try {
            table.delete(listOfBatchDelete);
        } finally {
            if (hbasePool != null && table != null) {
                hbasePool.putTable(table);
            }
        }
    }
    

    Method 2: If you want to do a batch delete on the basis of a scan result.

    public bulkDelete(final HTable table) throws IOException {
        Scan s=new Scan();
        List<Delete> listOfBatchDelete = new ArrayList<Delete>();
        //add your filters to the scanner
        s.addFilter();
        ResultScanner scanner=table.getScanner(s);
        for (Result rr : scanner) {
            Delete d=new Delete(rr.getRow());
            listOfBatchDelete.add(d);
        }
        try {
            table.delete(listOfBatchDelete);
        } catch (Exception e) {
            LOGGER.log(e);
    
        }
    }
    

    Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase. CoProcessors have many inbuilt issues if you need I can provide a detailed description to you. Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.

    Modified code to support batch operation.

    int batchSize = 50;
    int batchCounter=0;
    for(int i=startRange;i<=endRange;i++){
    
    String key = tablePrefix+i;
    Delete d=new Delete(Bytes.toBytes(key));
    listOfBatchDelete.add(d);  
    batchCounter++;
    
    if(batchCounter==batchSize){
        try {
            table.delete(listOfBatchDelete);
            listOfBatchDelete.clear();
            batchCounter=0;
        }
    }}
    

    Creating HBase conf and getting table instance.

    Configuration hConf = HBaseConfiguration.create(conf);
    hConf.set("hbase.zookeeper.quorum", "Zookeeper IP");
    hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);
    
    HTable hTable = new HTable(hConf, tableName);