Search code examples
distributedaccumulo

How to query across multiple remote accumulo instances


I'm new to accumulo and had a newbie question.

I have several independent, remote accumulo instances. I would like to run a single query across all the instances simultaneously and aggregate the results. Is there a library or a standard method/best practice of doing this ?

thx


Solution

  • There is no Accumulo-recommended way to do this. Like you consider them independent clusters, we would also consider them independent and rely on you, the user, to aggregate data you query from each. Given that Scanners and BatchScanners expose an Iterator, it is very straightforward to merge the results from each Iterator (Guava's Iterators class might be helpful).