Search code examples
javahadoopaccumulo

How to override the functions of SortedKeyValueIterator interface in Accumulo?


I am trying to create a custom iterator, but since there are no tutorials available, I looked at the code provided at the Accumulo github page.

There I found that all classes implement the SortedKeyValueIterator and override its functions.

What is the role of these functions and what should be the approach when overriding these funcions when creating a new class that implements SortedKeyValueIterator.

This is the sample code of the RowFilter class that I was trying to understand .

 public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException {
super.init(source, options, env);
this.decisionIterator = new RowIterator(source.deepCopy(env));
}

 public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
 RowFilter newInstance;
 try {
       newInstance = getClass().newInstance();
     } catch (Exception e) 
      {
         throw new RuntimeException(e);
      }
  newInstance.setSource(getSource().deepCopy(env));
  newInstance.decisionIterator = new RowIterator(getSource().deepCopy(env));
  return newInstance;
   }

I want to know, what does this code do, and how should I override these functions if i want another class to implement the SortedKeyValueIterator.


Solution

  • Start with a look at the Javadoc on SortedKeyValueIterator -- http://accumulo.apache.org/1.6/apidocs/. That's a good starting place for what each method is supposed to do.

    A good analogy in writing an iterator is to think of your Accumulo table (which the iterator is operating on) as a single-linked list (in sorted order). next() moves to the next node in the list and seek() moves forward/down skipping zero to many nodes. init() provides any necessary configuration (your iterator is likely running in the Accumulo server) from both the Accumulo table configuration and the client. deepCopy() should duplicate the exact state of the current iterator into a new instance (similar to Object.clone()).

    I can also provide two examples of custom iterators:

    You can also take a look at other provided "user-facing" iterators in Accumulo http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/package-summary.html