Search code examples
pythongoogle-cloud-platformgoogle-cloud-bigtablehappybase

Perform a google.cloud.happybase Bigtable RowKeyRegexFilter Scan


UPDATE: This only happens with Google Cloud Bigtable Emulator, not with actual development or production BigTable instances (Google Cloud SDK 149.0.0)

I'm trying to do row filtering by Key regex filter, everything is working like a charm (filter by prefix, filter by key start and stop range, by key, by keys) but I can't get it working passing in the RowKeyRegexFilteras filter, it just returns all the keys as an empty keys scan:

# all the boilerplate to create a happybase connection skipped 
t = connection.table("sometable")
t.put(
    b'row1',
    {
       b"family1:col2": b".1",
       b"family2:col2": b".12",
    }
)
t.put(
    b'row2',
    {
       b"family1:col2": b".2",
       b"family2:col2": b".22",
    }
)
t.put(
    b'row3',
    {
       b"family1:col2": b".3",
       b"family2:col2": b".32",
    }
)
rows = t.scan(
    filter=RowKeyRegexFilter(b'.+3')
)
print(len([i for i in rows])

That gives always 3, no matter if you put (nomatchforsure)+ as regex, I could not find any documentation with a working example, and the most amazing thing is that google.cloud.happybase.table.Table.rows performs a filter by row key always with RowKeyRegexFilter, but passing regex into rows method instead of real rows keys don't give regex filtering either, you can see it

here: https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/blob/master/src/google/cloud/happybase/table.py#L197

and here: https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/blob/master/src/google/cloud/happybase/table.py#L971

Any help on this would be very appreciated


Solution

  • UPDATE: It's actually annotated in the docs as noticed by @gary-elliott: https://cloud.google.com/bigtable/docs/emulator#filters Regular expressions must contain only valid UTF-8 characters, unlike the actual Cloud Bigtable service which can process regular expressions as arbitrary bytes. Although something simple like (notmatchforsure)+is not working either although it seems containing valid UTF8 characters, on my testings I would say it is not limited, but generally speaking not working. Anyway is correctly warned in docs.

    The actual problem is a bug on the emulator, I updated the answer to avoid misleading feedback, the solution was to create a development instance for testing the code, so for now if you want to do some development with Regex filters in BigTable you gotta create (and pay for...) an at least development instance ($0.65/hour, $0.17/GB at the moment of the response). Hope this helps as if someone is expecting to play with emulator he can get some hours stuck as I was.