Search code examples
javajava-7objectinputstreamobjectoutputstream

WriteObject not properly writing a Set?


I hope I didn't just find a bug in Java! I am running JDK 7u11 (mostly because that is the sanctioned JVM allowed by my employer) and I am noticing a very odd issue.

Namely, I am chunking data into a LinkedHashSet and writing it to a file using the ObjectOutputStream daisy changed through the GZIpOutputStream (mentioning this just in case it matters).

Now, when I get to the other side of the program and readObject() I notice that the size always reads 68, which I is the first size. The underlying table can have many more or less than 68, but the .size() method always returns 68. More troubling, when I try to manually iterate the underlying Set, it also stops at 68.

while(...) {
    oos.writeInt(p_rid);
    oos.writeObject(wptSet);
    wptSet.clear();
    // wptSet = new LinkedHashSet<>(); // **This somehow causes the heapsize to increase dramatically, but it does solve the problem**
}

And when reading it

Set<Coordinate> coordinates = (Set<Coordinate>) ois.readObject();

the coordinates.size() always returns 68. Now, I could make a workaround by also .writeInt() the size, but I can only iterate through 68 members!

Notice the wptSet = new LinkedHashSet<>() line actually solves the issue. The main problem with that is it makes my heapsize skyrocket when looking at the program in JVisualVM.

Update: I actually just found a viable workaround that fixes the memory leak of re-instantiating wptSet... System.gc() Calling that after each call to .clear() actually keeps the memory leak away.

Either way, I shouldn't have to do this and shipping out the LinkedHashSet should not exhibit this behavior.


Solution

  • Alright, I think I understand what you are asking.

    Here is an example to reproduce...

    import java.util.*;
    import java.io.*;
    
    class Example {
        public static void main(String[] args) throws Exception {
            Set<Object> theSet = new LinkedHashSet<>();
            final int size = 3;
    
            for(int i = 0; i < size; ++i) {
                theSet.add(i);
            }
    
            ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
            ObjectOutputStream objectsOut = new ObjectOutputStream(bytesOut);
    
            for(int i = 0; i < size; ++i) {
                objectsOut.writeObject(theSet);
                theSet.remove(i); // mutate theSet for each write
            }
    
            ObjectInputStream objectsIn = new ObjectInputStream(
                new ByteArrayInputStream(bytesOut.toByteArray()));
    
            for(;;) {
                try {
                    System.out.println(((Set<?>)objectsIn.readObject()).size());
                } catch(EOFException e) {
                    break;
                }
            }
        }
    }
    

    The output is

    3
    3
    3
    

    What is going on here is that ObjectOutputStream detects that you are writing the same object every time. Each time theSet is written, a "shared reference" to the object is written so that the same object is deserialized each time. This is explained in the documentation:

    Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written.

    In this case you should use writeUnshared(Object) which will bypass this mechanism, instead of writeObject(Object).