Search code examples
hbaserowfilter

What is the HBase RegexStringComparator's efficiency?


Like the title, what is the HBase RegexStringComparator's efficiency in RowFilter if under this three circumstance: 1: I need to match the beginning of the row , such as "abc*", "abc\d" and so on, I thought it has a good efficiency because it doesn't need to scan the whole table.

2: match at the middle position of the row, such as "\d{3,4}abc\w+" and so on, I think it need to scan all the rows and has a bad efficiency .

3: match at the end of the row, and like the second circumstance, also has a bad efficiency.

Do I get the right understand?


Solution

  • Only use STARTROW and ENDROW for range scan can speed up query efficiency, any FILTER runs after SCAN, so all the 3 conditions you have described have the same efficiency.