Search code examples
powershellpowershell-4.0

Scan txt file for multiple strings and save the following lines


I have a problem that I am trying to solve, however, due to my non existing PowerShell knowledge it is proving to be harder than I hoped. So any help would be appreciated.

The problem can be simplified as:

  1. Find a string in a txtfile

  2. Extract the information on the row after that string

  3. Store the information in a handle

  4. Find a second string in the txtfile and repeat the procedure

  5. Store both strings in a new file or delete everything else in the txt file.

I am then trying to do this for approx 20k files. I would love to have the information under their keyword and comma delimited so that I can import them in other systems.

My files look somewhat like the following

random words 
that are unimportant 
Keyword
FirstlineofNumbersthatIwanttoExtract
random words again that are unimportant 
Secondkeyword
SecondLineOfNumbersThatIWantToExtract
end of the file 

All files are however not similar in terms of the row that the lines I want to extract are on. I would the output to be something like

Keyword, SecondKeyword
FirstLineOfNumbersThatIWantToExtract, SecondLineOfNumbersThatIWantToExtract

And done. I got this far

 [System.IO.DirectoryInfo]$folder = 'C:\users\xx\Desktop\mappcent3'

 foreach ($file in ($folder.EnumerateFiles())) {
     if ($file.Extension -eq '.txt') {

         $content = Get-Content $file

         $FirstRegex = 'KeyWordOne
    (.+)$'

    $First_output = "\1"
    $test = Select-String -Path $file.FullName -Pattern $FirstRegex 

  }
}

Solution

  • This would do something similar to what you are asking. This requires PowerShell 3.0+

    $path = 'C:\users\xx\Desktop\mappcent3'
    $firstKeyword = "Keyword"
    $secondKeyword = "Secondkeyword"
    $resultsPath = "C:\Temp\results.csv"
    Get-ChildItem $path -Filter "*.txt" | ForEach-Object{
        # Read the file in
        $fileContents = Get-Content $_.FullName
    
        # Find the first keyword data
        $firstKeywordData = ($fileContents | Select-String -Pattern $firstKeyword -Context 0,1 -SimpleMatch).Context.PostContext[0]
    
        # Find the second keyword data
        $secondKeywordData = ($fileContents | Select-String -Pattern $secondKeyword -Context 0,1 -SimpleMatch).Context.PostContext[0]
    
        # Create a new object with details gathered. 
        [pscustomobject][ordered]@{
            File = $_.FullName
            FirstKeywordData = $firstKeywordData
            SecondKeywordData = $secondKeywordData
        }
    
    } | Export-CSV $resultsPath -NoTypeInformation
    

    Select-String is what does most of the magic here. We take advantage of -Context which consumes lines before and after the match. We want the one following so that is why we use 0,1. Wrap that up in a custom object and then we can export it to a CSV file.

    Keyword Overlap

    Beware that your keywords can overlap and create odd results in your output files. In your sample Keyword matches multiple lines so the result set would reflect that.


    If you did just want to write back to the original file you could easily do that as well

    "$firstKeywordData,$secondKeywordData" | Set-Content $_.FullName
    

    Or something similar.