Search code examples
vb.netcarriage-return

Separating large file and inserting carriage returns based on string


New to VB.Net but a friend recommended that I used it for what I'm trying to do. I have a huge text file and I want to insert carriage returns in after a specific string.

Apart from the mess I have below , how would I alter this to read a file and then once we see the text "ext" insert a new line feed. I'm expecting one of the lines in the input file to produce alot of carriage returns.

Currently what I have managed to mock together below reads an input file until end of line and writes it out again into another file.

Module Module1
Sub Main()
    Try
        ' Create an instance of StreamReader to read from a file. 
        ' The using statement also closes the StreamReader. 
        Using sr As StreamReader = New StreamReader("C:\My Documents\input.txt")
            Dim line As String
            ' Read and display lines from the file until the end of  
            ' the file is reached. 

            Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
                Do Until sr.EndOfStream
                    line = sr.ReadLine()
                    sw.WriteLine(line)
                    Console.WriteLine("done")
                Loop
            End Using
        End Using
    Catch e As Exception
        ' Let the user know what went wrong.
        Console.WriteLine("The file could not be read:")
        Console.WriteLine(e.Message)
    End Try
    Console.ReadKey()
End Sub

Changes made following comments.. Falling over at 500mb files due to memory constraints:

    Sub Main()
    Try
        ' Create an instance of StreamReader to read from a file. 
        ' The using statement also closes the StreamReader. 
        Using sr As StreamReader = New StreamReader("C:\My Documents\input.txt")
            Dim line As String
            Dim term As String = "</ext>"
            ' Read and display lines from the file until the end of  
            ' the file is reached. 

            Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
                Do Until sr.EndOfStream
                    line = sr.ReadLine()
                    line = line.Replace(term, term + Environment.NewLine)
                    sw.WriteLine(line)
                    Console.WriteLine("done")
                Loop
            End Using
        End Using

Solution

  • Since your lines are very big, you'll have to:

    • Read/Write one character at a time
    • Save the last x characters
    • If the last x characters are equal to your term, write a new line

      Dim term As String = "</ext>"
      Dim lastChars As String = "".PadRight(term.Length)
      
      Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
          Using sr As New System.IO.StreamReader("C:\My Documents\input.txt")
              While Not sr.EndOfStream
                  Dim buffer(1) As Char
                  sr.Read(buffer, 0, 1)
      
                  lastChars &= buffer(0)
                  lastChars = lastChars.Remove(0, 1)
      
                  sw.Write(buffer(0))
      
                  If lastChars = term Then
                      sw.Write(Environment.NewLine)
                  End If
      
              End While
          End Using
      End Using
      

    Note: This will not work with a Unicode file. This assume each characters are one byte.