Search code examples
regexpowershellsearchms-wordpowershell-2.0

Powershell search matching string in word document


I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.

So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.

#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\MORLAB"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "CRHPCD01"

Function getStringMatch
{
  # Loop through all *.doc files in the $path directory
  Foreach ($file In $files)
  {
   $document = $application.documents.open($file.FullName,$false,$true)
   $range = $document.content
   $wordFound = $range.find.execute($findText)

   if($wordFound) 
    { 
     "$file.fullname has $wordfound" | Out-File $output -Append
    }

  }
$document.close()
$application.quit()
}

getStringMatch

Solution

  • #ERROR REPORTING ALL
    Set-StrictMode -Version latest
    $path     = "c:\Temp"
    $files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
    $output   = "c:\temp\wordfiletry.csv"
    $application = New-Object -comobject word.application
    $application.visible = $False
    $findtext = "First"
    $charactersAround = 30
    $results = @{}
    
    Function getStringMatch
    {
        # Loop through all *.doc files in the $path directory
        Foreach ($file In $files)
        {
            $document = $application.documents.open($file.FullName,$false,$true)
            $range = $document.content
    
            If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
                 $properties = @{
                    File = $file.FullName
                    Match = $findtext
                    TextAround = $Matches[0] 
                 }
                 $results += New-Object -TypeName PsCustomObject -Property $properties
            }
        }
    
        If($results){
            $results | Export-Csv $output -NoTypeInformation
        }
    
        $document.close()
        $application.quit()
    }
    
    getStringMatch
    
    import-csv $output
    

    There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.

    We have the variable $charactersAround which sets the number of characters to match around the $findtext. Also I though the output was a better fit for a CSV file so I used $results to capture a hashtable of properties that, in the end, are output to a csv file.

    Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.

    Sample Output

    Match TextAround                                                        File                          
    ----- ----------                                                        ----                          
    First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx