Search code examples
javaxmliteratorstax

Iterating over a two level structure using nested iterators


I have the following two level XML structure. A list of boxes, each containing a list of drawers.

<Boxes>
    <Box id="0">
        <Drawers>
            <Drawer id="0"/>
            <Drawer id="1"/>
            ...
        </Drawers>
    </Box>
    <Box id="1">
...
    </Box>
</Boxes>

I'm parsing it using StAX and exposed the structure through two Iterators:

  1. BoxIterator implements Iterator<Box>, Iterable<Box>
  2. Box implements Iterable<Drawer>
  3. DrawerIterator implements Iterator<Drawer>

I can then do the following:

BoxIterator boxList;
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId()
  }
}

Under the hood of those Iterators I'm using StAX and both of them are accessing the same underlying XMLStreamReader. If I call BoxIterator.next() it will influence the result that will be returned on subsequent calls to DrawerIterator.next() because the cursor will have moved to the next box.

Does this break the contract of Iterator? Is there a better way to iterate over a two level structure using StAX?


Solution

  • Does this break the contract of Iterator?

    No.

    The Java Iterator imposes two "contracts". The first contract is the Java interface itself, which declares 3 methods: hasNext(), next(), and remove(). Any class which implements this Iterator interface must define those methods.

    The second contract defines the behaviour of the Iterator:

    hasNext() [...] returns true if the iteration has more elements. [...] next() returns the next element in the iteration [and] throws NoSuchElementException if the iteration has no more elements.

    That is the entire contract.

    It is true that if the underlying XMLStreamReader is advanced, it can mess up your BoxIterator and/or DrawerIterator. Alternately, calling BoxIterator.next() and/or DrawerIterator.next() at the wrong points could mess up the iteration. However, used correctly, such as in your example code above, it works properly and greatly simplifies the code. You just need to document the proper usage of the iterators.

    As a concrete example, the Scanner class implements Iterator<String>, and yet has many, many other methods that advance the underlying stream. If there existed a stronger contract imposed by the Iterator class, then the Scanner class itself would be violating it.


    As Ivan points out in the comments, boxList should not be of type class BoxIterator implements Iterator<Box>, Iterable<Box>. You really should have:

    class BoxList implements Iterable<Box> { ... }
    class BoxIterator implements Iterator<Box> { ... }
    
    BoxList boxList = ...;
    for (Box box : boxList) {
      for (Drawer drawer : box) {
        drawer.getId()
      }
    }
    

    While having one class implement both Iterable and Iterator is not technically wrong for your use case, it can cause confusion.

    Consider this code in another context:

    List<Box> boxList = Arrays.asList(box1, box2, box3, box4);
    for(Box box : boxList) {
        // Do something
    }
    for(Box box : boxList) {
        // Do some more stuff
    }
    

    Here, boxList.iterator() is called twice, to create two separate Iterator<Box> instances, for iterating the list of boxes twice. Because the boxList can be iterated over multiple times, each iteration requires a new iterator instance.

    In your code:

    BoxIterator boxList = new BoxIterator(xml_stream);
    for (Box box : boxList) {
      for (Drawer drawer : box) {
        drawer.getId();
      }
    }
    

    because you are iterating over a stream, you can't (without rewinding the stream, or storing the extracted objects) iterate over the same nodes a second time. A second class/object is not needed; the same object can act as both Iterable and Iterator ... which saves you one class/object.

    Having said that, premature optimization is the root of all evil. The savings of one class/object is not worth the possible confusion; you should split BoxIterator into a BoxList implements Iterable<Box>, and BoxIterator implements Iterator<Box>.