Search code examples
javadata-structureshashmapiteratorpriority-queue

Implementing Iterator of the custom Collection backed by PriorityQueue and HashMap


I've implemented HashPriorityQueue class, which combines HashMap (for fast lookup) and PriorityQueue (for sorting the data).

Another improvement I made is to guarantee ordered iteration. Because PriorityQueue does not promise such a thing, from JavaDocs:

The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular order.

I also want to guarantee ordered iteration and should work in a multithreaded environment.

To make HashPriorityQueue class work I've already done the following steps:

  • Extended HashMap class.
  • Added a private field PriorityQueue.
  • Override all the methods that change the values of the HashMap so that I can add and remove values from my queue.
  • Added queue-related methods: poll() and peek().
  • Add an implementation of a custom Iterator for this new data structure that internally makes a copy of the queue and uses poll() in the next() method to maintain ordered iteration.

Code:

public class HashPriorityQueue<K, V> extends HashMap<K, V> implements Iterable<AbstractMap.SimpleEntry<K, V>>{
    private final PriorityQueue<K> queue;

    /* CONSTRUCTORS */

    public HashPriorityQueue(Comparator<K> comparator) {
        queue = new PriorityQueue(comparator);
    }

    public HashPriorityQueue() {
        queue = new PriorityQueue();
    }


    /* QUEUE METHODS */

    public AbstractMap.SimpleEntry<K, V> poll() {
        K key = queue.poll();
        V val = remove(key);
        return new AbstractMap.SimpleEntry<K, V>(key, val);
    }

    public AbstractMap.SimpleEntry<K, V> peek() {
        K key = queue.peek();
        V val = get(key);
        return new AbstractMap.SimpleEntry<K, V>(key, val);
    }

    @Override
    public V remove(Object key) {
        queue.remove(key);
        return super.remove(key);
    }

    public V remove(AbstractMap.SimpleEntry<V, K> entry) {
        return remove(entry.getKey());
    }


    @Override
    public V put(K key, V value) {
        queue.add(key);
        return super.put(key, value);
    }

    @Override
    public Iterator<AbstractMap.SimpleEntry<K, V>> iterator() {
        return new PriorityIterator();
    }



    private class PriorityIterator implements Iterator<AbstractMap.SimpleEntry<K, V>>{
        PriorityQueue<K> keys;
        K cursor;

        public PriorityIterator() {
            keys = new PriorityQueue<>(HashPriorityQueue.this.queue);
        }

        @Override
        public boolean hasNext() {
            return !keys.isEmpty();
        }

        @Override
        public AbstractMap.SimpleEntry<K, V> next() {
            cursor = keys.poll();
            V v = HashPriorityQueue.this.get(cursor);
            return new AbstractMap.SimpleEntry<>(cursor,v);
        }

        @Override
        public void remove() {
            HashPriorityQueue.this.remove(cursor);
        }
    }
}

Currently, the iterator creates a copy of the queue and iterates over the keys by polling them from the copy of the queue. The corresponding value is being obtained from the map using get().

The iterator isn't aware of any structural and non-structural medications of the map.

So the actual question is:

How can I ensure that my Collection has not been modified during the process of iteration? I was thinking of adding a boolean flag isChanged, would it be a good approach?

I would also appreciate other suggestions regarding this implementation.


Solution

  • which will also guarantee ordered iteration and should work in a multi-threaded environment.

    Firstly, let's start with a single-threaded implementation and then expand it.

    Disclaimer: I don't treat this problem as having practical application, but rather as a project having a value from the educational perspective.

    The rationale behind this remark would be clear if we enumerate the time complexities of the operations available with this custom Collection in comperison with TreeMap:

    • iterator() - O(n * log n) (worse than TreeMap);
    • remove() - O(n) (worse than TreeMap);
    • poll() - O(log n);
    • put() - O(log n);
    • peek() - O(1);
    • get() - O(1) (better than TreeMap).

    Operations with Iterator:

    • next() - O(log n) (worse than iterator returned by the TreeSet);
    • hasNext() - O(1);
    • iterating over the all keys - O(n * log n).

    get() - is the only method in this collection that performs better then corresponding methods of the TreeMap. All the rest have the same or even worse performance due to being effected by the ovehead of maintaining the PriorityQueue which has a logorithmic time complexity for enqueuing and dequeuing methods.

    Favor Composition over Inheritance

    When you need to be able to use the functionality of a particular class in another class, inheritance isn't the first approach you have to think about, rather the last resort.

    Why? Inheritance creates tight coupling, like in the code you've provided your new Collection happens to be completely dependent on the implementation of HashMap.

    In fact, your custom class is broken. You didn't override all the methods that allow to modify the map structurally, for instance compute(), merge(), etc. But there's more to that, since it's a subclass of HashMap, it can be altered by removing elements from the collection of values obtained via values() or by removing the entries from the entry set, in these case there would be not possible to change the PriorityQueue accordingly.

    Sure, values() and entrySet() can be overridden by making them return unmodifiable collections. So it turns out you need override almost each and every method. And nevertheless a new functionality which can be introduced in the future or changes of the existing methods might break your code.

    For more information on that see the Item "Favor composition over inheritance" of the "Effective Java" by Joshua Bloch (former Sun-employee which has developed many features of Java including Collections framework).

    The better approach would be to use Composition, i.e. replace IS-A relationship with HAS-A relationship by creating a class that uses a HashMap, i.e. has a field of type HashMap, rather than being a HashMap itself. And that would be a Loose coupling because your Collection no longer depends on the HashMap implementation, methods of your collection would delegate calls to the HashMap allowing it to do whatever it does internally.

    You are not bound with any contracts and can expose only a limited number of methods that allow to interact with your Collection.

    Note that unfortunately you can't implement Map interface because it might cause problems if interface would be extended.

    That how it might be implemented:

    class HashPriorityQueueNonSync<K, V> implements Iterable<Map.Entry<K, V>> {
        
        private final PriorityQueue<K> queue;
        private final Map<K, V> map = new HashMap<>();
        private int modCount = 0;
        
        /* CONSTRUCTORS */
        
        public HashPriorityQueueNonSync(Comparator<K> comparator) {
            queue = new PriorityQueue<>(comparator);
        }
        
        public HashPriorityQueueNonSync() {
            queue = new PriorityQueue<>();
        }
        
        /* QUEUE METHODS */
        
        public Map.Entry<K, V> poll() {
            modCount++;
            K key = queue.poll();
            V val = remove(key);
            return Map.entry(key, val);
        }
        
        public Map.Entry<K, V> peek() {
            K key = queue.peek();
            V val = map.get(key);
            return Map.entry(key, val);
        }
        
        /* MAP METHODS */
        
        public V get(Object key) {
            return map.get(key);
        }
        
        public V put(K key, V value) {
            modCount++;
            queue.add(key);
            return map.put(key, value);
        }
        
        public V remove(Object key) {
            modCount++;
            queue.remove(key);
            return map.remove(key);
        }
        
        public V remove(Map.Entry<K, V> entry) {
            modCount++;
            queue.remove(entry.getKey());
            return map.remove(entry.getKey());
        }
        
        @Override
        public Iterator<Map.Entry<K, V>> iterator() {
            return new PriorityIterator();
        }
        
        private class PriorityIterator implements Iterator<Map.Entry<K, V>> {
            private PriorityQueue<K> keys;
            private K cursor;
            private final int expectedModCount;
            
            public PriorityIterator() {
                this.keys = new PriorityQueue<>(HashPriorityQueueNonSync.this.queue);
                this.expectedModCount = HashPriorityQueueNonSync.this.modCount;
            }
            
            @Override
            public boolean hasNext() {
                return !keys.isEmpty();
            }
            
            @Override
            public Map.Entry<K, V> next() {
                if (expectedModCount != modCount) {
                    throw new ConcurrentModificationException();
                }
                cursor = keys.poll();
                V v = HashPriorityQueueNonSync.this.get(cursor);
                return Map.entry(cursor, v);
            }
            
            @Override
            public void remove() {
                if (expectedModCount != modCount) {
                    throw new ConcurrentModificationException();
                }
                HashPriorityQueueNonSync.this.remove(cursor);
            }
        }
    }
    

    How to prevent Structural modification during Iteration

    The common practice to ensure that iterator would reflect the actual state of the collection is to introduce two counters of modifications: one as the instance field of the collection (modCount in the code above), another in the Iterator (expectedModCount).

    When these variables are not equal, that means that there has been a modification after the iterator was created that was done other than the means of iterator. All the implementations of Iterator from the JDK would throw ConcurrentModificationException in such a case.

    Concurrent implementation

    First of all, there two concepts to consider:

    • Atomicity - when a thread is modifying the state of an object (in this case changes the underlying queue and map of the custom collection), all other threads can not observe intermediate changes, they can see either both queue and map fully modified or not. We can ensure that all actions on both queue and map would occur atomically by using synchronized keyword on methods of this custom Collection. The approach shown below is very basic and can be observed in legacy collections like Vector and Hashtable, but because under the hood we have two separate underlying collections which need to be accessed atomically it hard apply anything more neat than synchronized.

    • Happens-before Order - describes visibility of subsequent changes. If one action happens-before another, then the first is visible to and ordered before the second. One of the ways to ensure this is by using the volatile keyword.

    Note that methods of the iterator doesn't require synchronization (it's mean to be used withing a single thread only), but we need to synchronize the method of the collection responsible for obtaining the iterator.

    That's how concurrent implementation might look like:

    public class HashPriorityQueue<K, V> implements Iterable<Map.Entry<K, V>> {
    
        private final PriorityQueue<K> queue;
        private final Map<K, V> map = new HashMap<>();
        private volatile int modCount = 0;
        
        /* CONSTRUCTORS */
        
        public HashPriorityQueue(Comparator<K> comparator) {
            queue = new PriorityQueue<>(comparator);
        }
        
        public HashPriorityQueue() {
            queue = new PriorityQueue<>();
        }
        
        /* QUEUE METHODS */
        
        public synchronized Map.Entry<K, V> poll() {
            modCount++;
            K key = queue.poll();
            V val = remove(key);
            return Map.entry(key, val);
        }
        
        public synchronized Map.Entry<K, V> peek() {
            K key = queue.peek();
            V val = map.get(key);
            return Map.entry(key, val);
        }
        
        /* MAP METHODS */
        
        public synchronized V get(Object key) {
            return map.get(key);
        }
        
        public synchronized V put(K key, V value) {
            modCount++;
            queue.add(key);
            return map.put(key, value);
        }
        
        public synchronized V remove(Object key) {
            modCount++;
            queue.remove(key);
            return map.remove(key);
        }
        
        public synchronized V remove(Map.Entry<K, V> entry) {
            modCount++;
            queue.remove(entry.getKey());
            return map.remove(entry.getKey());
        }
        
        @Override
        public synchronized Iterator<Map.Entry<K, V>> iterator() {
            return new PriorityIterator();
        }
        
        private class PriorityIterator implements Iterator<Map.Entry<K, V>> {
            private PriorityQueue<K> keys;
            private K cursor;
            private final int expectedModCount;
            
            public PriorityIterator() {
                this.keys = new PriorityQueue<>(HashPriorityQueue.this.queue);
                this.expectedModCount = HashPriorityQueue.this.modCount;
            }
            
            @Override
            public boolean hasNext() {
                return !keys.isEmpty();
            }
            
            @Override
            public Map.Entry<K, V> next() {
                if (expectedModCount != modCount) {
                    throw new ConcurrentModificationException();
                }
                cursor = keys.poll();
                V v = HashPriorityQueue.this.get(cursor);
                return Map.entry(cursor, v);
            }
            
            @Override
            public void remove() {
                if (expectedModCount != modCount) {
                    throw new ConcurrentModificationException();
                }
                HashPriorityQueue.this.remove(cursor);
            }
        }
    }
    

    A very small Test

    main()

    public static void main(String[] args) {
        HashPriorityQueue<Integer, Integer> hpq = new HashPriorityQueue<>();
    
        ExecutorService executor = Executors.newFixedThreadPool(3);
    
        executor.submit(() -> { for (int i = 3; i < 7; i++) hpq.put(i, 1 << i); });
        executor.submit(() -> { for (int i = 0; i < 3; i++) hpq.put(i, 1 << i); });
        executor.submit(() -> { for (int i = 7; i < 10; i++) hpq.put(i, 1 << i); });
    
        try {
            executor.awaitTermination(3, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        executor.shutdown();
        
        hpq.forEach(System.out::println);
    }
    

    Output:

    0=1
    1=2
    2=4
    3=8
    4=16
    5=32
    6=64
    7=128
    8=256
    9=512