Search code examples
hbaseapache-phoenix

Apache phoenix problem with order by and offset giving duplicate results in paging


I'm using order by with offset for paging in apache phoenix. I am getting a duplicate result on the next page.

I have a view of the HBase table in Apache phoenix. I am using TO_NUMBER() to convert my stringified column into an integer. I want to sort this converted column in descending order.

Example : I have an Hbase view like :

 ------------------------------
| TITLE      |   COUNT          |
  ------------------------------
| t8         | 10               |
  ------------------------------
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |

My Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 0;

Result :

+------------+------------------+
| TITLE      |   COUNT          |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t7         | 1                |
| t10        | 1                |

Next Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 5;

result :

+------------+------------------+
| TITLE      |        COUNT     |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Here t7 is repeated on both the result.

Expected result :

1st Query:-
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |

2nd Query
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Please help me to know what is the exact issue? Also, I want to know, is CURSOR can be used for the same purpose?.


Solution

  • Well I think because all these records have an equal count:

    | t4    | 1 |
    | t6    | 1 |           
    | t10   | 1 |           
    | t7    | 1 |
    

    With your query you cannot guarantee the order of these 4 records, and therefore your pagination query wouldn't work as you expect.

    You can try to also add the title in the 'ORDER BY' as a second order column. And you should have more consistent results.