I have a text file with a large number of log messages. I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.
I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)
Sample Code
function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){
# Get content from the input file
$fileContent = Get-Content $filePath
# Regular expression (Regex) of the given start and end patterns
$pattern = "$startPattern(.*?)$endPattern"
# Perform the Regex opperation
$result = [regex]::Match($fileContent,$pattern).Value
# Finally return the result to the caller
return $result
}
# Clear the screen
Clear-Host
$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input
Improved script based on Theo's answer. The following points need to be improved:
-Wait
and -Tail
optionsUpdated Script
# Clear the screen
Clear-Host
# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
[console]::bufferwidth = $bw
[console]::bufferheight = $bh
}
else
{
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = $bh
$newsize.width = $bw
$pswindow.buffersize = $newsize
}
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
# Input file path
$inputFile = "THE-LOG-FILE.log"
# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
You need to perform streaming processing of your Get-Content
call, in a pipeline, such as with ForEach-Object
, if you want to process lines as they're being read.
Get-Content -Wait
, because such a call doesn't terminate by itself (it keeps waiting for new lines to be added to the file, indefinitely), but inside a pipeline its output can be processed as it is being received, even before the command terminates.You're trying to match across multiple lines, which with Get-Content
output would only work if you used the -Raw
switch - by default, Get-Content
reads its input file(s) line by line.
-Raw
is incompatible with -Wait
.Here's a proof of concept, but note the following:
-Tail 100
is hard-coded - adjust as needed or make it another parameter.
The use of -Wait
means that the function will run indefinitely - waiting for new lines to be added to $filePath
- so you'll need to use Ctrl-C to stop it.
While you can use a Get-TextBetweenTwoStrings
call itself in a pipeline for object-by-object processing, assigning its result to a variable ($result = ...
) won't work when terminating with Ctrl-C, because this method of termination also aborts the assignment operation.
To work around this limitation, the function below is defined as an advanced function, which automatically enables support for the common -OutVariable
parameter, which is populated even in the event of termination with Ctrl-C; your sample call would then look as follows (as Theo notes, don't use the automatic $input
variable as a custom variable):
# Look for blocks of interest in the input file, indefinitely,
# and output them as they're being found.
# After termination with Ctrl-C, $result will also contain the blocks
# found, if any.
Get-TextBetweenTwoStrings -OutVariable result -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
Per your feedback you want the block of lines to encompass the full lines on which the start and end patterns match, so the regexes below are enclosed in .*
The word pattern in your $startPattern
and $endPattern
parameters is a bit ambiguous in that it suggests that they themselves are regexes that can therefore be used as-is or embedded as-is in a larger regex on the RHS of the -match
operator.
However, in the solution below I am assuming that they are be treated as literal strings, which is why they are escaped with [regex]::Escape()
; simply omit these calls if these parameters are indeed regexes themselves; i.e.:
$startRegex = '.*' + $startPattern + '.*'
$endRegex = '.*' + $endPattern + '.*'
The solution assumes there is no overlap between blocks and that, in a given block, the start and end patterns are on separate lines.
Each block found is output as a single, multi-line string, using LF ("`n"
) as the newline character; if you want a CRLF newline sequences instead, use "`r`n"
; for the platform-native newline format (CRLF on Windows, LF on Unix-like platforms), use [Environment]::NewLine
.
# Note the use of "-" after "Get", to adhere to PowerShell's
# "<Verb>-<Noun>" naming convention.
function Get-TextBetweenTwoStrings {
# Make the function an advanced one, so that it supports the
# -OutVariable common parameter.
[CmdletBinding()]
param(
$startPattern,
$endPattern,
$filePath
)
# Note: If $startPattern and $endPattern are themselves
# regexes, omit the [regex]::Escape() calls.
$startRegex = '.*' + [regex]::Escape($startPattern) + '.*'
$endRegex = '.*' + [regex]::Escape($endPattern) + '.*'
$inBlock = $false
$block = [System.Collections.Generic.List[string]]::new()
Get-Content -Tail 100 -Wait $filePath | ForEach-Object {
if ($inBlock) {
if ($_ -match $endRegex) {
$block.Add($Matches[0])
# Output the block of lines as a single, multi-line string
$block -join "`n"
$inBlock = $false; $block.Clear()
}
else {
$block.Add($_)
}
}
elseif ($_ -match $startRegex) {
$inBlock = $true
$block.Add($Matches[0])
}
}
}