I'm trying to retrieve the oldest cell of a certain row in BigTable in my DataFlow pipeline (using Beam SDK 2.4.0). However I can't seem to find any type of filter that would allow me to do this?
Further down the pipeline the value of the oldest cell would be used in conjunction with the newest cell and be written to BigQuery. This is what I have so far to retrieve the most recent cell:
input.apply("Read protos from BigTable", BigtableIO.read()
.withProjectId(config.getBigtableProject())
.withInstanceId(config.getBigtableInstance())
.withTableId(this.bigTableId)
.withRowFilter(RowFilter.newBuilder()
.setFamilyNameRegexFilter("proto")
.setCellsPerColumnLimitFilter(1)
.build()))
.apply("Row to TableRow", ParDo.of(new DoFn<Row, TableRow>() { ...
I would expect there to be something similar, selecting 1 cell but in reverse order?
Any ideas?
This feature is possible, but there's no easy easy answer. In general, Bigtable only allows one form of ordering. In the case of cells, the version ordering is largest to smallest.
If you want to get a notion of "oldest", you can do one of the following:
Long.MAX_VALUE - now
when you write, and then you can use the standard ordering.