Search code examples
javagoogle-cloud-platformgoogle-cloud-bigtablebigtable

Filtering data in bigtable


My table contain rowKey and and 2 columns. RowKey looks like string#timestamp, first column contain String value and second contain json as String.

        Query query = Query.create(bigTableTableName).range(rowKeyBegining, rowKeyEnd);
        ServerStream<Row> rows = bigtableDataClient.readRows(query);
        for (Row row : rows) {
            //extract cell from row
        }

For example:

        rowKey                  First                 Second
Greg#2023-04-01T12:23:00       cookie               "some JSON data"
Greg#2023-04-03T22:20:54       cake                 "some JSON data"
Greg#2023-04-03T15:03:23       cookie               "some JSON data"
Greg#2023-04-10T20:54:33       salad                "some JSON data"
Greg#2023-04-19T18:00:00       cookie               "some JSON data"
...

I need to retrieve range of rows for time range between Greg#2023-04-01T00:00:00 and Greg#2023-04-30T23:59:00 but request should return rows where value in column first is equal "cookie".

Is there way to add extra filter for the first column?


Solution

  • With HBase, or BigTable, it's the same engine, you can only filter on the key (or the prefix of the key). Therefore, request the lines that you want to inspect and then iterate over the line to keep only those who match your conditions.

    That's why HBase fit perfectly distributed system like Spark/Dataproc to distribute the post processing on the cluster if you have a large number of rows.