Search code examples
javahadoophbase

HBase - delete columns of rows with range of timestamp without scanning


I was wonder if I could delete some columns of some rows with timestamp without scanning the whole database

my code is like below:

public static final void deleteBatch(long date, String column, String...ids) throws Exception{
    Connection con = null; // connection instance
    HTable table = null; // htable instance
    
    List<Delete> deletes = new ArrayList<Delete>(ids.length);
    for(int i = 0; i < ids.length; i++){
        String id = ids[i];
        Delete delete = new Delete(id.getBytes());
        delete.addColumn(/* CF */, Bytes.toString(column));
        /*
            also tried:
            delete.addColumn(/* CF */, Bytes.toString(column), date);
        */
        delete.setTimestamp(date);
        
        deletes.add(delete);
    }
    
    
    table.delete(deletes);
    table.close();
}

this works, but deletes all column prior to given date, I want something like this:

Delete delete = new Delete(id.getBytes());
delete.setTimestamp(date-1, date);

I don't want to delete prior or after a specific date, I want to delete exact time range I give. Also my MaxVersion of HTableDescriptor is set to Integer.MAX_VALUE to keep all changes.

as mentioned in the Delete API Documentation:

Specifying timestamps, deleteFamily and deleteColumns will delete all versions with a timestamp less than or equal to that passed

it delets all columns which their timestamps are equal or less than given date.

how can I achieve that?

any answer appreciated


Solution

  • After struggling for weeks I found a solution for this problem.

    the apache HBase has a feature called coprocessor which hosts and manages the core execution of data level operations (get, delete, put ...) and can be overrided(developed) for custom computions like data aggregation and bulk processing against the data outside the client scope.

    there are some basic implemention for common problems like bulk delete and etc..