I have written a very simple application tha accepts batches of telemetry log rows in XML files and transcribes them to a table on a SQL Server. Each one of these blocks of incoming data I refer to as a "chapter".
For performance reasons, each chapter is wrapped in a transaction so that instead of a hundred implicit transactions there's a single transaction for 100 new rows.
Unfortunately, there are duplicates, and due to the shared transactional context they take out the whole chapter. This is uncommon but not rare enough to ignore. So I added a loop, a retry flag and a catch (SqlException) that maintains a skip-list. When a row barfs on insert, I roll back the transaction, create a new one, add the row number to the skip list, set the retry flag and let it loop. All the rows are reprocessed except for rows that are skip-listed. If a second row barfs, the same thing happens except the skip-list has two items.
This whole arrangement works. It's the end-game that's giving me curry. When the chapter is processed end to end without any more exceptions, the loop exits and then I check whether the skip-list is empty. When it isn't empty, I attempt to use Trace.TraceWarning() to write an event log entry detailing the failed rows and the block of XML for later forensics.
It's at this point that things go south. This Trace statement barfs with a Win32 exception that claims 'device not ready'.
Has anyone seen anything like this?
Overnight I noticed that occasionally it works without complaint. I have inserted a Thread.Sleep() just before the Trace.TraceWarning() so it will be interesting to see whether this sorts it out. I should probably add that I noticed that the message is successfully logged to Visual Studio's trace listener and this is what makes me think it's speed or timing related.
A day later and it is clear that sleeping the thread makes no difference. I have re-written the whole business so that a StringBuilder accumulates state, and a single trace statement occurs after the loop exits. The Win32Exception is no longer in evidence, even when multiple rows are rejected requiring more than two passes. What exactly causes that particular exception remains unclear to me. I hope that someone can cast light into this dark corner, but I have nothing further to add, since a single log entry per chapter is ideal; the "workaround" would be an improvement to the code even without the Win32Exception.
Spoke too soon. The evil Win32Exception is back. Here's the code as it stands:
public void Write(string xml)
{
StringBuilder sb = new StringBuilder();
//Trace.TraceInformation(xml);
SqlConnection cxn = new SqlConnection(Properties.Settings.Default.DatabaseBob);
cxn.Open();
SqlTransaction txn = null;
SqlCommand cmd = new SqlCommand("INSERT INTO LOG(MAC, snip more fields ) VALUES(@MAC, snip more params)", cxn);
var P = cmd.Parameters;
P.Add(new SqlParameter("@MAC", SqlDbType.BigInt));
/*snip define more SQL params*/
XmlDocument doc = new XmlDocument();
bool barfed = false;
List<int> skipList = new List<int>();
_stopwatch.Start();
int i = 0;
doc.LoadXml(xml);
var root = doc.FirstChild.NextSibling;
P["@MAC"].Value = ulong.Parse(root.Attributes["MAC"].Value, System.Globalization.NumberStyles.AllowHexSpecifier);
P["@D2"].Value = P["@D3"].Value = 0; //COM2
do
try
{
cmd.Transaction = txn = cxn.BeginTransaction();
i = 0;
barfed = false;
for (var n = root.FirstChild; n != null; n = n.NextSibling)
{
foreach (XmlAttribute attr in n.Attributes)
{
var p = P["@" + attr.Name];
switch (p.SqlDbType)
{
case SqlDbType.DateTime:
p.Value = DateTime.Parse(attr.Value).ToUniversalTime();
break;
case SqlDbType.Float:
p.Value = float.Parse(attr.Value);
break;
case SqlDbType.Bit:
p.Value = Convert.ToBoolean(int.Parse(attr.Value));
break;
}
}
i++;
if (!skipList.Contains(i))
if ((DateTime)P["@GPSTIME"].Value > DateTime.UtcNow.AddMinutes(1))
{
sb.AppendFormat("Node {0} is being skipped because it has a future date.\r\n", i);
skipList.Add(i);
}
else
cmd.ExecuteNonQuery();
}
txn.Commit();
sb.AppendFormat("{0} logs per {1}ms", i, _stopwatch.ElapsedMilliseconds);
}
catch (SqlException)
{
sb.AppendFormat("Node {0} is being skipped because it is a duplicate.\r\n", i);
txn.Rollback();
skipList.Add(i);
barfed = true;
}
catch (Exception ex)
{
sb.AppendLine(ex.Message);
txn.Rollback();
}
while (barfed);
_stopwatch.Reset();
if (skipList.Count == 0)
Trace.TraceInformation(sb.ToString());
else
{
sb.AppendLine();
sb.Append(xml);
Trace.TraceWarning(sb.ToString());
}
}
And this is the stack trace from the exception:
at System.Diagnostics.EventLogInternal.InternalWriteEvent(UInt32 eventID, UInt16 category, EventLogEntryType type, String[] strings, Byte[] rawData, String currentMachineName)
at System.Diagnostics.EventLogInternal.WriteEvent(EventInstance instance, Byte[] data, Object[] values)
at System.Diagnostics.EventLog.WriteEvent(EventInstance instance, Object[] values)
at System.Diagnostics.EventLogTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType severity, Int32 id, String message)
at System.Diagnostics.TraceInternal.TraceEvent(TraceEventType eventType, Int32 id, String format, Object[] args)
at System.Diagnostics.Trace.TraceWarning(String message)
at Bob.ChapterWriter.Write(String xml) in I:\Project Bob\LogWriter\ChapterWriter.cs:line 179
at SyncInvokeWrite(Object , Object[] , Object[] )
at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
The Write method is invoked by WCF in response to an MSMQ message arriving. The content of the message is the XML string. Removing the event log listener makes the problem go away (so far).
For those examining the stack trace, line 179 is at the end of the pasted code. Here it is again:
Trace.TraceWarning(sb.ToString());
A thought comes to me: I wonder is it because the XML makes the message too big for the event log? I already have a setting that limits the size of a chapter file. I'll try reducing it and see what happens.
Yep, the real problem was oversized message text. The maximum size of an event log entry is 64K including all the Microsoft headers and whatnot. I knew that but it didn't occur to me that this could be the problem because I had the XML limited to 32K. The reason it was too much is this: when you put XML in text that will be embedded in XML, it gets escaped which increases the size of every illegal character by a factor of four. All my angle brackets and double-quotes were quadrupling in size. And there are lots of them.
The take-aways from this are: