I'm writing my own binary serializer optimized for game development. So far it's fully functional. It emits IL to generate the [de]serialization methods given a sequence of types in advance. The only missing feature is serializing things by reference, everything is currently being serialized by value.
In order to implement it, I have to understand it first. This is what I'm finding to be a bit tricky. Let me show you what I understood in these couple of examples:
Example 1 (as seen here):
public class Person
{
public string Name;
public Person Friend;
}
static void Main(string[] args)
{
Person p1 = new Person();
p1.Name = "John";
Person p2 = new Person();
p2.Name = "Mike";
p1.Friend = p2;
Person[] group = new Person[] { p1, p2 };
var serializer = new DataContractSerializer(group.GetType(), null,
0x7FFF /*maxItemsInObjectGraph*/,
false /*ignoreExtensionDataObject*/,
true /*preserveObjectReferences : this is where the magic happens */,
null /*dataContractSurrogate*/);
serializer.WriteObject(Console.OpenStandardOutput(), group);
}
Now this is completely understood. We have a root object which is the array, referencing two unique persons. The p1.Friend
happens to be the p2
. So instead of serializing the p1.Friend
by value we just store an id that points to p2
which we've already serialized.
However; have a look at this second example:
static void Example2()
{
var p1 = new Person() { Name = "Diablo" };
var p2 = new Person() { Name = "Mephesto" };
p1.Friend = p2;
var serializer = new DataContractSerializer(typeof(Person), null, 0x7FFF, false, true, null);
serializer.WriteObject(Console.OpenStandardOutput(), p1);
Console.WriteLine("\n");
serializer.WriteObject(Console.OpenStandardOutput(), p2);
}
Now, according to my understanding: when serializing p1
the serializer will serialize p1.Name
and p1.Friend
. In the second WriteObject
, the serializer has already serialized p2
(which is p1.Friend
) so it just serializes an id that points to p1.Friend
instead of serializing it by value.
Running the code and viewing the output it doesn't seem to be the case. In the 2nd output we see the serializer serializing p2
by value as if it hasn't came across it yet... And that I didn't get. It's like there's an id counter internally that gets reset at the end of WriteObject
Here's another similar example:
static void Example3()
{
var p1 = new Person() { Name = "Diablo" };
var p2 = p1;
var serializer = new DataContractSerializer(typeof(Person), null, 0x7FFF, false, true, null);
serializer.WriteObject(Console.OpenStandardOutput(), p1);
Console.WriteLine("\n");
serializer.WriteObject(Console.OpenStandardOutput(), p2);
}
Again, the second output shows that we're serializing p2
as if we haven't encountered a definition for it yet.
Note that I didn't choose DataContractSerializer
for any particular reason, any serializer that supports serializing by reference works.
I tried to ILSpy on DataContractSerializer
but I got lost quickly and couldn't figure out much.
Example2
, why didn't the serializer store an id to
p1.Friend
when serializing p2
? - Is 'serializing by reference'
only applied to a single object hierarchy, or how does it work in
general?I've tagged protobuf-net cause it's similar in that it's a binary serializer and emits IL. I would love to hear how seiralizing by reference is implemented there :p
Additional thought: if you apply this to strings, you might want to special-case as effective equality rather than reference equality - no point serialising two different instances (references) of the same string