I have a data structure that I read over UDP, and it looks like this:
$T_DATA,17,<?xml version="1.0"?>
<TData id="Channel 4">
<Meta>
<InstrumentID>17</InstrumentID>
<DatagramID>4</DatagramID>
<Timestamp>2024-09-10 15:00:57.480</Timestamp>
</Meta>
<Data>
<Value ID="1" type="0">108.33</Value>
<Value ID="2" type="0">-39</Value>
<Value ID="3" type="0">422.9</Value>
</Data>
</TData>
????
I want to remove everything before <Tdata
and everything after </Tdata>
My code that does exactly that looks like this:
static string StripNonXmlContent(string input)
{
// Find the start of the XML tag and ignore anything before it
string stopString = "</TData>";
int xmlStartIndex = input.IndexOf("<TData");
// TODO: Explain magic number 7. Without it return string gets too short.
int xmlStopIndex = input.LastIndexOf(stopString) + stopString.Length + 7;
// If both the start and end of the XML are found
if (xmlStartIndex >= 0 && xmlStopIndex >= 0)
{
// Extract everything between <TData> and </TData>
return input.Substring(xmlStartIndex, (xmlStopIndex - xmlStartIndex));
}
// If no valid XML structure is found, return the input as is
return input;
}
In the beginning my last index looked like: xmlStopIndex = input.LastIndexOf(stopString) + stopString.Length;
, but that cut the return too short, the last line became just <T
. and through trial and error I found out that by adding the magic number 7 I got it to work.
Can somebody explain to me why this is?
Further investigation. The string actually looks like this:
???$?T?C?L?I?E?N?T?_?D?A?T?A?,?1?7?,?<???x?m?l? ?v?e?r?s?i?o?n?=?"?1?.?0?"???>?
?<?T?D?a?t?a? ?i?d?=?"?C?h?a?n?n?e?l? ?5?"?>?
? ? ? ? ?<?M?e?t?a?>?
? ? ? ? ? ? ? ? ?<?I?n?s?t?r?u?m?e?n?t?I?D?>?1?7?<?/?I?n?s?t?r?u?m?e?n?t?I?D?>?
? ? ? ? ? ? ? ? ?<?D?a?t?a?g?r?a?m?I?D?>?5?<?/?D?a?t?a?g?r?a?m?I?D?>?
? ? ? ? ? ? ? ? ?<?T?i?m?e?s?t?a?m?p?>?2?0?2?4?-?0?9?-?1?0? ?1?5?:?3?4?:?5?6?.?4?8?6?<?/?T?i?m?e?s?t?a?m?p?>?
? ? ? ? ?<?/?M?e?t?a?>?
? ? ? ? ?<?D?a?t?a?>?
? ? ? ? ? ? ? ? ?<?V?a?l?u?e? ?I?D?=?"?1?"? ?t?y?p?e?=?"?0?"?>?1?0?5?.?7?4?<?/?V?a?l?u?e?>?
? ? ? ? ? ? ? ? ?<?V?a?l?u?e? ?I?D?=?"?2?"? ?t?y?p?e?=?"?0?"?>?-?3?9?<?/?V?a?l?u?e?>?
? ? ? ? ? ? ? ? ?<?V?a?l?u?e? ?I?D?=?"?3?"? ?t?y?p?e?=?"?0?"?>?3?3?5?.?4?<?/?V?a?l?u?e?>?
? ? ? ? ?<?/?D?a?t?a?>?
?<?/?T?D?a?t?a?>?
????
And there are seven ?
inside the string </TData>
which explains why I have to add 7 to make this work.
There must be something wrong with how I read the data. It looks like this:
byte[] receiveBytes = udpClient.Receive(ref remoteEndPoint);
string receivedData = Encoding.ASCII.GetString(receiveBytes, 0, receiveBytes.Length);
Console.WriteLine($"Received data from {remoteEndPoint}:");
Console.WriteLine(receivedData);
Looks like your data is arriving as UTF16, but you're reading it as ASCII. Try reading the data as UTF16.