Search code examples
powershellperformancereplace

Replacing many strings in large files using Lookuptable and Switch


I found a solution here: Scanning log file using ForEach-Object and replacing text is taking a very long time.

However, I'm running into an error when a string in the Lookuptable contains a parenthesis (( or )).

$lookupTable = @{
  "Hello (1234)" = "new string 1"
  "Some'thing (2023)" = "other"
}

$inputfile = "c:\somewhere\*.*"

Get-ChildItem $inputfile -Filter *.txt | ForEach-Object {
  $outfile = Join-Path -Path "c:\else\" -ChildPath ('{0}{1}_new' -f $_BaseName, $_.Extension)
  $regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::escape($_) }) -join '|')
  $writer = [System.IO.StreamWriter]::new($outfile, $true)

  Switch -regex -file $_ {
    $regexLookup {
      $line = $_
      $match = [regex]::Match($line, $regexLookup)
      while ($match.Success) {
        $line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
        $match = $match.NextMatch()
      }
      $writer.WriteLine($line)
    }
    default { $write.Writeline($_) }
  }

  $writer.flush()
  $writer.Dispose()
}

The error I get is:

The regular expression pattern Hello (1234) is not valid.

At c:\wheremyfileis.ps1:....

  • $line = $line -replace $match.Value, $lookupTable[[regex ...

Solution

  • The issue you're encountering arises because parentheses are special characters in regex. When they appear in your lookup keys, they are being interpreted as part of the search pattern, leading to an error.

    You are already using [regex]::escape($_) in your script, which should escape special characters in the regex, but that doesn't seem to be working. I've made some adjustments to your code and it works as expected on my machine now:

    $lookupTable = @{
      "Hello (1234)" = "new string 1"
      "Some'thing (2023)" = "other"
    }
    
    $inputfile = "c:\somewhere\*.*"
    
    Get-ChildItem $inputfile -Filter *.txt | ForEach-Object {
      $outfile = Join-Path -Path "c:\else\" -ChildPath ('{0}_new{1}' -f $_.BaseName, $_.Extension)
      $regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::escape($_) }) -join '|')
      $writer = [System.IO.StreamWriter]::new($outfile, $true)
    
      Switch -regex -file $_ {
        $regexLookup {
          $line = $_
          $match = [regex]::Match($line, $regexLookup)
          while ($match.Success) {
            $escapedMatch = [regex]::Escape($match.Value)
            $line = $line -replace $escapedMatch, $lookupTable[$match.Value]
            $match = $match.NextMatch()
          }
          $writer.WriteLine($line)
        }
        default { $writer.Writeline($_) }
      }
    
      $writer.flush()
      $writer.Dispose()
    }
    

    I adjusted the -replace operation to use the escaped version of $match.Value to ensure that any special characters in the match are properly handled, as well as ensuring that the key used to lookup the replacement string in $lookupTable is the unescaped match value. Finally, I changed your code so the file extension remains the same, with _new being appended before the final period.