Search code examples
javahadoopaccumulo

How do I perform Aggregation over the column qualifier field in Accumulo?


Suppose I have a Table like this in Accumulo:

a cf1:cq1 [ ]    1

b cf1:cq1 [ ]    3

c cf1:cq1 [ ]    2

And if i apply the SummingCombiner on this table and insert a row "a cf1 cq1 2" then I would get the result as:

a cf1:cq1 [ ]    3

b cf1:cq1 [ ]    3

c cf1:cq1 [ ]    2

What I want to know is if there is an iterator that could help me perform Aggregation over a particular field like column qualifier.

In short can I perform a query like "Sum of the values of those rows where column qualifier is cq1".

And if there is not a readymade iterator for this kind of query how should I go about creating a custom Iterator for it?


Solution

  • I don't think we have anything in Accumulo directly which does what you're asking, but https://github.com/joshelser/accumulo-column-summing is very similar and could serve as a good starting point.

    You could also try to use the ColumnSliceFilter which would limit the results to the column qualifier that you want and easily write a SummingIterator (or just sum them client-side).