I have been trying to optimize the JSON Serialization of over 500K POCO's to be importd into a MongoDB, and been running into nothing but headaches. I originally tried the Newtonsoft Json.Convert() function but that was taking too long. Then, based on the advice of several posts here on SO, Newtonsoft's own site, and other locations I have attempted to manually serialize the objects. But have not noticed much, if any performance gain.
This is the code that I use to kick off the serialization process ... Above each line, in the comments, is the amount of time that each individual operation took to complete, given a dataset of 1000 objects.
//
// Get reference to the MongoDB Collection
var collection = _database.GetCollection<BsonDocument>("sessions");
//
// 8ms - Get the number of records already in the MongoDB. We will skip this many when retrieving more records from the RDBMS
Int32 skipCount = collection.AsQueryable().Count();
//
// 74ms - Get the records as POCO's that will be imported into the MongoDB (using Telerik OpenAcces ORM)
List<Session> sessions = uow.DbContext.Sessions.Skip(skipCount).Take(1000).ToList();
//
// The duration times displayed in the foreach loop are the cumulation of the time spent on
// ALL the items and not just a single one.
foreach (Session item in sessions)
{
StringWriter sw = new StringWriter();
JsonTextWriter writer = new JsonTextWriter(sw);
//
// 585,934ms (yes - 9.75 MINUTES) - Serialization of 1000 POCOs into a JSON string. Total duration of ALL 1000 objects
item.ToJSON(ref writer);
//
// 16ms - Parse the StringWriter into a String. Total duration of ALL 1000 objects.
String json = sw.ToString();
//
// 376ms - Deserialize the json into MongoDB BsonDocument instances. Total duration of ALL 1000 objects.
BsonDocument doc = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<BsonDocument>(json); // 376ms
//
// 8ms - Insert the BsonDocument into the MongoDB dataStore. Total duration of ALL 1000 objects.
collection.InsertOne(doc);
}
Currently these take about .5 - .75 sec for each individual object to be serialized to a JSON document ... which equals about 10 minutes for 1000 documents ... 100 minutes for 10,000 documents, etc. I find that the durations are fairly consistent, but ultimately this means that in order to load the 600K records it will take about 125 straight hours of processing to perform the dataload. This is for a messaging system that could eventually be adding 20K - 100K new documents per day so performance is a REAL issue for us.
The object(s) I am serializing contain a couple of layers of "navigation" properties or "nested documents" (depending on whether you view them through an ORM or MongoDB lens) but is not otherwise particularly complex or noteworthy.
The serialization code I constructed passes the JsonTextWriter instance created in the previous code sample, into the ToJSON functions of the POCOs, so we are not creating new writers for each model to use when serializing itself.
The following code is a truncated example of a few of the objects in an attempt to illustrate the implementation technique (how the writer is being passed and how the JSON is being manually constructed). There are many more properties and a few more related/nested objects but this is an example of the "deepest" traversal I have to make.
It begins with the "Session" object and recursively calls it's dependent properties to also serialize themselves.
public class Session
{
#region properties
public Guid SessionUID { get; set; }
public String AssetNumber { get; set; }
public Int64? UTCOffset { get; set; }
public DateTime? StartUTCTimestamp { get; set; }
public DateTime? StartTimestamp { get; set; }
public DateTime? EndTimestamp { get; set; }
public String Language { get; set; }
// ... many more properties
#endregion properties
#region navigation properties
public virtual IList<SessionItem> Items { get; set; }
#endregion navigation properties
#region methods
public void ToJSON(ref JsonTextWriter writer)
{
Session session = this;
// {
writer.WriteStartObject();
writer.WritePropertyName("SessionUID");
writer.WriteValue(session.SessionUID);
writer.WritePropertyName("AssetNumber");
writer.WriteValue(session.AssetNumber);
writer.WritePropertyName("UTCOffset");
writer.WriteValue(session.UTCOffset);
writer.WritePropertyName("StartUTCTimestamp");
writer.WriteValue(session.StartUTCTimestamp);
writer.WritePropertyName("StartTimestamp");
writer.WriteValue(session.StartTimestamp);
writer.WritePropertyName("EndTimestamp");
writer.WriteValue(session.EndTimestamp);
writer.WritePropertyName("Language");
writer.WriteValue(session.Language);
// continues adding remaining instance properties
#endregion write out the properties
#region include the navigation properties
// "Items": [ {}, {}, {} ]
writer.WritePropertyName("Items");
writer.WriteStartArray();
foreach (SessionItem item in this.Items)
{
item.ToJSON(ref writer);
}
writer.WriteEndArray();
#endregion include the navigation properties
// }
writer.WriteEndObject();
//return sw.ToString();
}
#endregion methods
}
public class SessionItem
{
#region properties
public Int64 ID { get; set; }
public Int64 SessionID { get; set; }
public Int32 Quantity { get; set; }
public Decimal UnitPrice { get; set; }
#endregion properties
#region navigation properties
public virtual Session Session { get; set; }
public virtual IList<SessionItemAttribute> Attributes { get; set; }
#endregion navigation properties
#region public methods
public void ToJSON(ref JsonTextWriter writer)
{
// {
writer.WriteStartObject();
#region write out the properties
writer.WritePropertyName("ID");
writer.WriteValue(this.ID);
writer.WritePropertyName("SessionID");
writer.WriteValue(this.SessionID);
writer.WritePropertyName("Quantity");
writer.WriteValue(this.Quantity);
writer.WritePropertyName("UnitPrice");
writer.WriteValue(this.UnitPrice);
#endregion write out the properties
#region include the navigation properties
//
// "Attributes": [ {}, {}, {} ]
writer.WritePropertyName("Attributes");
writer.WriteStartArray();
foreach (SessionItemAttribute item in this.Attributes)
{
item.ToJSON(ref writer);
}
writer.WriteEndArray();
#endregion include the navigation properties
// }
writer.WriteEndObject();
//return sw.ToString();
}
#endregion public methods
}
public class SessionItemAttribute : BModelBase, ISingleID
{
public Int64 ID { get; set; }
public String Name { get; set; }
public String Datatype { get; set; }
public String Value { get; set; }
#region navigation properties
public Int64 ItemID { get; set; }
public virtual SessionItem Item { get; set; }
public Int64 ItemAttributeID { get; set; }
public virtual ItemAttribute ItemAttribute { get; set; }
#endregion navigation properties
#region public methods
public void ToJSON(ref JsonTextWriter writer)
{
// {
writer.WriteStartObject();
#region write out the properties
writer.WritePropertyName("ID");
writer.WriteValue(this.ID);
writer.WritePropertyName("Name");
writer.WriteValue(this.Name);
writer.WritePropertyName("Datatype");
writer.WriteValue(this.Datatype);
writer.WritePropertyName("StringValue");
writer.WriteValue(this.StringValue);
writer.WritePropertyName("NumberValue");
writer.WriteValue(this.NumberValue);
writer.WritePropertyName("DateValue");
writer.WriteValue(this.DateValue);
writer.WritePropertyName("BooleanValue");
writer.WriteValue(this.BooleanValue);
writer.WritePropertyName("ItemID");
writer.WriteValue(this.ItemID);
writer.WritePropertyName("ItemAttributeID");
writer.WriteValue(this.ItemAttributeID);
#endregion write out the properties
// }
writer.WriteEndObject();
//return sw.ToString();
}
#endregion public methods
}
I suspect that I am overlooking something or that the problem lies in the manner in which I am implementing the serialization. One SO poster claimed to have reduced his load time from 28 seconds to 31 milliseconds by manually serializing the data so I was expecting somewhat more dramatic results. In fact, this is nearly the exact same performance I observed using Newtonsoft Json.Convert() method.
Any help diagnosing the source of latency in the serialization would be most appreciated. Thank you!
UPDATE
While I have not extricated the data access from the ORM yet I have been able to confirm that the latency is actually coming from the ORM (thank you commenters). When I added the FetchStrategy as suggested the latency was still there but the time moved from being spent on Serialization to being spent on the query (i.e. the loading of the navigation properties).
So the issue isn't serialization as much as it is optimizing the data retrieval.
In an effort to to provide closure to this question I wanted to post my solution.
After further research, the commentors on the original post had it correct. This was not a serialization issue but a data access issue. The ORM was "lazily loading" navigation properties as they were being requested during the serialization process. When I implemented the FetchStrategy to "greedily" fetch the associated objects the source of the latency shifted from the counters I had in place around the serialization process to the counters I placed around data access.
I was able to resolve this by adding indexes on the foreign key fields in the database. Latency dropped by over 90% and what was taking 100+ minutes to run is now being completed in 10.
So thanks to the folks who commented and helped remove my blinders by reminding me of what else was going on.