Search code examples
vb.netfilesearchms-wordstreamreader

VB.Net: Searching Word Document By Line


I'm attempting to read through a Word Document (800+ pages) line by line, and if that line contains certain text, in this case Section, simply print that line to console.

Public Sub doIt()
    SearchFile("theFilePath", "Section")
    Console.WriteLine("SHit")
End Sub

Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
    Dim sr As StreamReader = New StreamReader(strFilePath)
    Dim strLine As String = String.Empty

    For Each line As String In sr.ReadLine
        If line.Contains(strSearchTerm) = True Then
            Console.WriteLine(line)
        End If
    Next

End Sub

It runs, but it doesn't print out anything. I know the word "Section" is in there multiple times as well.


Solution

  • As already mentioned in the comments, you can't search a Word document the way you are currently doing. You need to create a Word.Application object as mentioned and then load the document so you can search it.

    Here is a short example I wrote for you. Please note, you need to add reference to Microsoft.Office.Interop.Word and then you need to add the import statement to your class. For example Imports Microsoft.Office.Interop. Also this grabs each paragraph and then uses the range to look for the word you are searching for, if found it adds it to the list.

    Note: Tried and tested - I had this in a button event, but put where you need it.

        Try
                    Dim objWordApp As Word.Application = Nothing
                    Dim objDoc As Word.Document = Nothing
                    Dim TextToFind As String = YOURTEXT
                    Dim TextRange As Word.Range = Nothing
                    Dim StringLines As New List(Of String)
    
                    objWordApp = CreateObject("Word.Application")
    
                    If objWordApp IsNot Nothing Then
                        objWordApp.Visible = False
                        objDoc = objWordApp.Documents.Open(FileName, )
                    End If
    
                    If objDoc IsNot Nothing Then
    
                        'loop through each paragraph in the document and get the range
                        For Each p As Word.Paragraph In objDoc.Paragraphs
                            TextRange = p.Range
                            TextRange.Find.ClearFormatting()
    
                            If TextRange.Find.Execute(TextToFind, ) Then
                                StringLines.Add(p.Range.Text)
                            End If
                        Next
    
                        If StringLines.Count > 0 Then
                            MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
                        End If
    
                        objDoc.Close()
                        objWordApp.Quit()
    
                    End If
    
    
                Catch ex As Exception
                    'publish your exception?
                End Try
    

    Update to use Sentences - this will go through each paragraph and grab each sentence, then we can see if the word exists... The benefit of this is it's quicker because we get each paragraph and then search the sentences. We have to get the paragraph in order to get the sentences...

    Try
                Dim objWordApp As Word.Application = Nothing
                Dim objDoc As Word.Document = Nothing
                Dim TextToFind As String = "YOUR TEXT TO FIND"
                Dim TextRange As Word.Range = Nothing
                Dim StringLines As New List(Of String)
                Dim SentenceCount As Integer = 0
    
                objWordApp = CreateObject("Word.Application")
    
                If objWordApp IsNot Nothing Then
                    objWordApp.Visible = False
                    objDoc = objWordApp.Documents.Open(FileName, )
                End If
    
                If objDoc IsNot Nothing Then
    
                    For Each p As Word.Paragraph In objDoc.Paragraphs
                        TextRange = p.Range
                        TextRange.Find.ClearFormatting()
                        SentenceCount = TextRange.Sentences.Count
                        If SentenceCount > 0 Then
                            Do Until SentenceCount = 0
                                Dim sentence As String = TextRange.Sentences.Item(SentenceCount).Text
                                If sentence.Contains(TextToFind) Then
                                    StringLines.Add(sentence.Trim())
                                End If
    
                                SentenceCount -= 1
                            Loop
                        End If
                    Next
    
                    If StringLines.Count > 0 Then
                        MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
                    End If
    
                    objDoc.Close()
                    objWordApp.Quit()
    
                End If
    
    
            Catch ex As Exception
                'publish your exception?
            End Try