Search code examples
.netstringfilelarge-files

Read large text file until a certain string


I have a large string separated text file (not single-char seperated) like this:

first data[STRING-SEPERATOR]second data[STRING-SEPERATOR] ...

I don't want to load the entire file in the memory because of its size (~250MB). If I read the entire file with System.IO.File.ReadAllText i get an OutOfMemoryException.

Therefore I want to read the file until the first appereance of [STRING-SEPERATOR], then proceed with the next string. It's like to "take" the first data off the file, process it and the go on with the second data which is now the first data of the file.

The System.IO.StreamReader.ReadLine() doesn't help me because the contents of the file is one line.

Have you got an idea how to read a file until a certain string in .NET?

I hope for some ideas, thank you.


Solution

  • Thank you for your replies. Here's the function I wrote in VB.NET:

    Public Function ReadUntil(Stream As System.IO.FileStream, UntilText As String) As String
                Dim builder As New System.Text.StringBuilder()
                Dim returnTextBuilder As New System.Text.StringBuilder()
                Dim returnText As String = String.Empty
                Dim size As Integer = CInt(UntilText.Length / 2) - 1
                Dim buffer(size) As Byte
                Dim currentRead As Integer = -1
    
                Do Until currentRead = 0
                    Dim collected As String = Nothing
                    Dim chars As String = Nothing
                    Dim foundIndex As Integer = -1
    
                    currentRead = Stream.Read(buffer, 0, buffer.Length)
                    chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead)
    
                    builder.Append(chars)
                    returnTextBuilder.Append(chars)
    
                    collected = builder.ToString()
                    foundIndex = collected.IndexOf(UntilText)
    
                    If (foundIndex >= 0) Then
                        returnText = returnTextBuilder.ToString()
    
                        Dim indexOfSep As Integer = returnText.IndexOf(UntilText)
                        Dim cutLength As Integer = returnText.Length - indexOfSep
    
                        returnText = returnText.Remove(indexOfSep, cutLength)
    
                        builder.Remove(0, foundIndex + UntilText.Length)
    
                        If (cutLength > UntilText.Length) Then
                            Stream.Position = Stream.Position - (cutLength - UntilText.Length)
                        End If
    
                        Return returnText
                    ElseIf (Not collected.Contains(UntilText.First())) Then
                        builder.Length = 0
                    End If
                Loop
    
                Return String.Empty
        End Function
    

    C#

    public static string ReadUntil(System.IO.FileStream Stream, string UntilText)
    {
        System.Text.StringBuilder builder = new System.Text.StringBuilder();
        System.Text.StringBuilder returnTextBuilder = new System.Text.StringBuilder();
        string returnText = string.Empty;
        int size = System.Convert.ToInt32(UntilText.Length / (double)2) - 1;
        byte[] buffer = new byte[size + 1];
        int currentRead = -1;
    
        while (currentRead != 0)
        {
            string collected = null;
            string chars = null;
            int foundIndex = -1;
    
            currentRead = Stream.Read(buffer, 0, buffer.Length);
            chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead);
    
            builder.Append(chars);
            returnTextBuilder.Append(chars);
    
            collected = builder.ToString();
            foundIndex = collected.IndexOf(UntilText);
    
            if ((foundIndex >= 0))
            {
                returnText = returnTextBuilder.ToString();
    
                int indexOfSep = returnText.IndexOf(UntilText);
                int cutLength = returnText.Length - indexOfSep;
    
                returnText = returnText.Remove(indexOfSep, cutLength);
    
                builder.Remove(0, foundIndex + UntilText.Length);
    
                if ((cutLength > UntilText.Length))
                    Stream.Position = Stream.Position - (cutLength - UntilText.Length);
    
                return returnText;
            }
            else if ((!collected.Contains(UntilText.First())))
                builder.Length = 0;
        }
    
        return string.Empty;
    }