Search code examples
c#substringstring-matchingmemorystream

Get substring from MemoryStream without converting entire stream to string


I would like to be able to efficiently get a substring from a MemoryStream (that originally comes from a xml file in a zip). Currently, I read the entire MemoryStream to a string and then search for the start and end tags of the xml node I desire. This works fine but the text file may be very large so I would like to avoid converting the entire MemoryStream into a string and instead just extract the desired section of xml text directly from the stream.

What is the best way to go about this?

string xmlText;
using (var zip = ZipFile.Read(zipFileName))
{
    var ze = zip[zipPath];
    using (var ms = new MemoryStream())
    {
        ze.Extract(ms);
        ms.Position = 0;
        using(var sr = new StreamReader(ms))
        {
            xmlText = sr.ReadToEnd();
        }
    }
}

string startTag = "<someTag>";
string endTag = "</someTag>";
int startIndex = xmlText.IndexOf(startTag, StringComparison.Ordinal);
int endIndex = xmlText.IndexOf(endTag, startIndex, StringComparison.Ordinal) + endTag.Length - 1;
xmlText = xmlText.Substring(startIndex, endIndex - startIndex + 1);

Solution

  • If your file is a valid xml file then you should be able to use a XmlReader to avoid loading the entire file into memory

    string xmlText;
    using (var zip = ZipFile.Read(zipFileName))
    {
        var ze = zip[zipPath];
        using (var ms = new MemoryStream())
        {
            ze.Extract(ms);
            ms.Position = 0;
            using (var xml = XmlReader.Create(ms))
            {
                if(xml.ReadToFollowing("someTag"))
                {
                    xmlText = xml.ReadInnerXml();
                }
                else
                {
                    // <someTag> not found
                }
            }
        }
    }
    

    You'll likely want to catch potential exceptions if the file is not valid xml.