Search code examples
javahadoopaccumulo

I am looking at writing an Accumulo iterator to return a random sample of a percentile of a table


I am looking at writing an Accumulo iterator to return a random sample of a percentile of a table.

I would appreciate any suggestions.

Thnaks,

Chris


Solution

  • You can extend org.apache.accumulo.core.iterators.Filter and randomly accept x% of the entries. The following iterator would randomly return 5 percent of the entries.

    import java.util.Random;
    
    import org.apache.accumulo.core.data.Key;
    import org.apache.accumulo.core.data.Value;
    import org.apache.accumulo.core.iterators.Filter;
    
    public class RandomAcceptFilter extends Filter {
        private Random rand = new Random();
    
        @Override
        public boolean accept(Key k, Value v) {
            return rand.nextDouble() < .05;
        }
    }