Given a third party system that streams XML to me via TCP. The TOTAL transmitted XML content (not one message of the stream, but concatenated messages) looks like this :
<root>
<insert ....><remark>...</remark></insert>
<delete ....><remark>...</remark></delete>
<insert ....><remark>...</remark></insert>
....
<insert ....><remark>...</remark></insert>
</root>
Every line of the above sample is individually processable. Since it is a streaming process, I cannot just wait out until everything arrives, I have to process the content as it comes. The problem is the content chunks can be sliced by any point, no tags are respected. Do you have some good advice on how to process the content if it arrives in fragments like this?
Chunk 1:
<root>
<insert ....><rem
Chunk 2:
ark>...</remark></insert>
<delete ....><remark>...</remark></delete>
<insert ....><remark>...</rema
Chunk N:
rk></insert>
....
<insert ....><remark>...</remark></insert>
</root>
EDIT:
While processing speed is not a concern (no realtime troubles), I cannot wait for the entire message. Practically the last chunk never arrives. The third party system sends messages whenever it encounters changes. The process never ends, it is a stream that never stops.
After further investigation we figured out that the XML stream has been sliced up by the TCP buffer, whenever it got full. Therefore, slicing happened actually randomly in the byte stream causing cuts even inside unicode characters. Therefore, we had to assemble the parts on byte level and convert that back to text. Should converstion fail, we waited for the next byte chunk, and tried again.