Search code examples
hbaserowfilter

How to improve RowFilter performance in HBase?


In my case, I use rowfilter to search certain rowkeys in HBase. I want to do fuzzy query, so I use scan and rowfilter instead of using 'Get'. However, when I have, for example, ten million rowkeys stored in HBase, it takes very very long time to scan out the results. So how can I improve the performance of rowfilter query?

try {

    for (String uid : uidsArr) {

        Scan scan = new Scan();
        Filter filter1 = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator(uid));
        scan.setFilter(filter1);
        scan.setMaxVersions(versions);


        ResultScanner scanner1 = table.getScanner(scan);
        Cell[] cells;
        for (Result res : scanner1) {
            cells = res.rawCells();
            list.addAll(getHBaseTableDataListFromCells(cells));

        }
    }

    return list;

} catch (Exception e) {
    e.printStackTrace();
}

Solution

  • to improve scan you have to specify start/end row keys. Otherwise your scan has to look thought ALL keys in the table. That's why it takes a lot of time.

     new Scan().withStartRow(startRow).withStopRow(stopRow)
    

    For example, if you are going to search by values, it is better to put it at the beginng of row key. So, search string should be PREFIX. However, it can cause hot regions problem. Other solution, is to have additional lookup table.