I've just lost a couple of hours debugging my app, and I believe I've stumbled upon a (another one o_O) Java bug... sniff... I hope it is not, because this would be sad :(
I'm doing the following:
mask
with some flagsObjectOutputStream.writeObject(mask)
)mask
Expected result: the second serialized object is different from the first one (reflects the changes in the instance)
Obtained result: the second serialized object is the exact copy of the first one
The code:
enum MyEnum {
ONE, TWO
}
@Test
public void testEnumSetSerialize() throws Exception {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream stream = new ObjectOutputStream(bos);
EnumSet<MyEnum> mask = EnumSet.noneOf(MyEnum.class);
mask.add(MyEnum.ONE);
mask.add(MyEnum.TWO);
System.out.println("First serialization: " + mask);
stream.writeObject(mask);
mask.clear();
System.out.println("Second serialization: " + mask);
stream.writeObject(mask);
stream.close();
ObjectInputStream istream = new ObjectInputStream(new ByteArrayInputStream(bos.toByteArray()));
System.out.println("First deserialized " + istream.readObject());
System.out.println("Second deserialized " + istream.readObject());
}
It prints:
First serialization: [ONE, TWO] Second serialization: [] First deserialized [ONE, TWO] Second deserialized [ONE, TWO] <<<<<< Expecting [] here!!!!
Am I using EnumSet
incorrectly? Do I have to create a new instance each time instead of clearing it?
Thanks for your input!
**** UPDATE ****
My initial idea was to use an EnumSet
as a mask to indicate which fields will be present or absent in the message that follows, so a sort of bandwidth and cpu usage optimization. It was very wrong!!! An EnumSet
takes ages to serialize, and each instance takes 30 (!!!) bytes! So much for the space economy :)
In a nutshell, while ObjectOutputStream
is very fast for primitive types (as I figured out already in a small test here: https://stackoverflow.com/a/33753694), it is painfully slooooow and inefficient with (especially small) objects...
So I worked around it by making my own EnumSet backed by an int, and serializing/deserializing the int directly (not the object).
static class MyEnumSet<T extends Enum<T>> {
private int mask = 0;
@Override
public boolean equals(Object o) {
if (o == null || getClass() != o.getClass()) return false;
return mask == ((MyEnumSet<?>) o).mask;
}
@Override
public int hashCode() {
return mask;
}
private MyEnumSet(int mask) {
this.mask = mask;
}
public static <T extends Enum<T>> MyEnumSet<T> noneOf(Class<T> clz) {
return new MyEnumSet<T>(0);
}
public static <T extends Enum<T>> MyEnumSet<T> fromMask(Class<T> clz, int mask) {
return new MyEnumSet<T>(mask);
}
public int mask() {
return mask;
}
public MyEnumSet<T> add(T flag) {
mask = mask | (1 << flag.ordinal());
return this;
}
public void clear() {
mask = 0;
}
}
private final int N = 1000000;
@Test
public void testSerializeMyEnumSet() throws Exception {
ByteArrayOutputStream bos = new ByteArrayOutputStream(N * 100);
ObjectOutputStream out = new ObjectOutputStream(bos);
List<MyEnumSet<TestEnum>> masks = Lists.newArrayList();
Random r = new Random(132477584521L);
for (int i = 0; i < N; i++) {
MyEnumSet<TestEnum> mask = MyEnumSet.noneOf(TestEnum.class);
for (TestEnum f : TestEnum.values()) {
if (r.nextBoolean()) {
mask.add(f);
}
}
masks.add(mask);
}
logger.info("Serializing " + N + " myEnumSets");
long tic = TicToc.tic();
for (MyEnumSet<TestEnum> mask : masks) {
out.writeInt(mask.mask());
}
TicToc.toc(tic);
out.close();
logger.info("Size: " + bos.size() + " (" + (bos.size() / N) + "b per object)");
logger.info("Deserializing " + N + " myEnumSets");
MyEnumSet<TestEnum>[] deserialized = new MyEnumSet[masks.size()];
ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(bos.toByteArray()));
tic = TicToc.tic();
for (int i = 0; i < deserialized.length; i++) {
deserialized[i] = MyEnumSet.fromMask(TestEnum.class, in.readInt());
}
TicToc.toc(tic);
Assert.assertArrayEquals(masks.toArray(), deserialized);
}
It's about 130x times faster during serialization and 25x times faster during deserialization...
MyEnumSets:
17/12/15 11:59:31 INFO - Serializing 1000000 myEnumSets 17/12/15 11:59:31 INFO - Elapsed time is 0.019 s 17/12/15 11:59:31 INFO - Size: 4019539 (4b per object) 17/12/15 11:59:31 INFO - Deserializing 1000000 myEnumSets 17/12/15 11:59:31 INFO - Elapsed time is 0.021 s
Regular EnumSets:
17/12/15 11:59:48 INFO - Serializing 1000000 enumSets 17/12/15 11:59:51 INFO - Elapsed time is 2.506 s 17/12/15 11:59:51 INFO - Size: 30691553 (30b per object) 17/12/15 11:59:51 INFO - Deserializing 1000000 enumSets 17/12/15 11:59:51 INFO - Elapsed time is 0.489 s
It's not as safe though. For example, it will not work for enums with more than 32 entries.
How can I ensure that the enum has less than 32 values on MyEnumSet creation?
ObjectOutputStream serializes references to objects and the first time an object is sent, the actual object. If you modify an object and send it again, all ObjectOutputStream does is send the reference to that object again.
This has a few consequences
The way to resolve this and get some memory back is to call reset() after each complete object. e.g. before calling flush()
Reset will disregard the state of any objects already written to the stream. The state is reset to be the same as a new ObjectOutputStream. The current point in the stream is marked as reset so the corresponding ObjectInputStream will be reset at the same point. Objects previously written to the stream will not be referred to as already being in the stream. They will be written to the stream again.
Another approach is to use writeUnshared, however this applies a shallow unshared-ness to the top level object. In the case of EnumSet
it will be different, however the Enum[]
it wraps is still shared o_O
Writes an "unshared" object to the ObjectOutputStream. This method is identical to writeObject, except that it always writes the given object as a new, unique object in the stream (as opposed to a back-reference pointing to a previously serialized instance).
In short, no this is not a bug, but expected behaviour.