Search code examples
accumulo

How to use a WholeRowIterator as the source of another iterator?


I am trying to filter out columns after using a WholeRowIterator to filter rows. This is to remove columns that were useful in determining which row to keep, but not useful in the data returned by the scan.

The WholeRowIterator does not appear to play nice as the source of another iterator such as a RegExFilter. I know the keys/values are encoded by the WholeRowIterator.

Are there any possible solutions to get this iterator stack to work?

Thanks.


Solution

  • Usually, the WholeRowIterator is the last iterator in the "stack" as it involves serializing the row (many key-values) into a single key-value. You probably don't want to do it more than once. But, let's assume you want to do that:

    You would want to write an Iterator which, deserializes each Key-Value into a SortedMap using the WholeRowIterator method, modify the SortedMap, reserialize it back into a single Key-Value, and then return it. This iterator would need to be assigned a priority higher than the priority given to the WholeRowIterator.

    Alternatively, you could extend the WholeRowIterator and override the encodeRow(List<Key>,List<Value>) method to not serialize your unwanted columns in the first place. This would save the extra serialization and deserialization the first approach has.