I need to read a large XML file in c# but with a few other considerations.
Formatting is as such:
<RecordSet>
<Record>
<TimeStamp>...</TimeStamp>
...
</Record>
<Record>
<TimeStamp>...</TimeStamp>
...
</Record>
</RecordSet>
There are a few obvious routes that I've considered such as using XmlReader and going linearly, even using concurrency. However, due to the nature of the data (ordered, only need a subset etc) I am sure there is a more efficient way to retrieve this. I have also tried methods of traversing the XML backwards but it's unclear if this is possible without loading the data upfront.
While sample code would be nice, any theoretical approaches would be greatly appreciated too.
For a fast xml-parser reading the data (file) might be the bottleneck. If this is the case you can try to avoid reading the whole file to speed things up (in this special case: ordered data, simple xml-structure).
You can implement a kind of binary search for the point where the date begins you are interested in. And only read and parse the file from this position onwards.
This is only possible if your data is read from a device whit random access (which is true for most media now a days except if you read from some kind of stream).
Use binary search to decide a byte-position to start reading than you need to skip read bytes until you detect a new <Record>
to start and than decide if you need to jump (based on the timestamp) forward or backward in your data to find the one-hour-ago-position.
Space-related you never need more than the size of one record as buffer while reading the data plus space to store the result.
To clear up the discussion with @jdweng below:
Record
-Tag starts (which is easy for this special xml-structure)) you can avoid reading the whole file. and just use e.g. XmlfReader after you found the tag you want to start to extract the data.