I've been trying to use the MultiRowRangeFilter in Google Bigtable, but I didn't manage to make it work properly. What I'm basically doing is scanning and processing different ranges from Bigtable using Dataflow.
List<RowRange> ranges = getRanges();
MultiRowRangeFilter filter = new MultiRowRangeFilter(ranges);
Scan scan = new Scan();
scan.setFilter(filter);
config = CloudBigtableScanConfiguration.Builder()
.withProjectId("my-project")
.withInstanceId("my-instance")
.withTableId("my-table")
.withScan(scan)
.build();
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setProject("my-project");
options.setStagingLocation("gs://my-bucket");
options.setRunner(DataflowRunner.class);
Pipeline p = Pipeline.create(options);
p.apply(Read.from(CloudBigtableIO.read(config)))
.apply(ParDo.of(new MyFunction()))
.apply(TextIO.write().to("gs://output-bucket"));
getRanges
is a function that returns a List<RowRange>
that have been initialized like this:
RowRange range = new RowRange("1388710#1823246", true, "1388710#1823302", true);
Instead of scanning and returning only the ranges that I'm interested in the scan returns all the data I have in my table.
Any idea what I've been doing wrong ?
Per discussion in the comments, MultiRowRangeFilter currently doesn't work with Cloud Dataflow, and the feature request is tracked in GitHub here:
https://github.com/googleapis/cloud-bigtable-client/issues/1239