Search code examples
powershellfilteringstreamreader

Filtering and sorting log lines in PowerShell according to date string


I have two separate scripts that do this process. First one looks at the large log and filters out specific lines from that long and puts them into a new log file. Then I have a second script that reads first ten characters of each line from that new log file (these actually represent the date of the log entry as yyyy-mm-dd), and based on that date, it puts that whole line of the log file into a new target file, whose name is based on that date (targetfile-yymmdd.log). Since my original logs tend to contain dates that span two or more dates, I need to sort them out so that each final log file only contains entries for one date, and so that the file name reflects that actual date.

I would like to consolidate these two scripts into one: read the line from the log, check if it matches the filter, if it does, check the first ten characters and then dump the line in the appropriate target file. Here are the basics, as I have them now:

Script 1 reads through a large log file (standard Apache htaccess log) and filters out lines based on a specific pattern, putting them in a new file:

$workingdate = [today's date as yymmdd ]
Get-Content "completelog-$workingdate.log" -ReadCount 200000 | 
foreach {
     $_ -match "(/(jsummit|popin|esa)/)" | 
     Add-Content "D:\logs\filteredlog-$workingdate.log"
}

Script 2 then goes through the new file and looks at the first ten characters from each line, which contain standard date as yyyy-mm-dd. It copies that line into a new file by the name targetfile-ddmmyy.log, where the date is based on the actual date from the line:

$file = "filtered-$workingdate.log" (where $workingdate is today's date as yymmdd)
$streamReader = New-Object System.IO.StreamReader -Arg "$file"
while($line = $streamReader.ReadLine()){
    $targetdate = $([datetime]::ParseExact($line.Substring(0,10), 'yyyy-mm-dd', $null).ToString('yymmdd'))
    $targetfile = "targetfile-$targetdate.log"
    $line | Add-Content $targetfile
}

Separetely, these two work well, but since my log file is over 20GB, I'd like to cut down on the time it takes to go through these logs (twice).


Solution

  • You could work with each matched line and skip creating the intermediate file.

    (Get-Content "completelog-$workingdate.log" -ReadCount 200000) | 
    %{ $_ } | ?{ $_ -match $REGEX } | %{
        $targetdate = '{0:yyMMdd}' -f $(Get-Date $_.Substring(0,10));
        $_ | Add-Content "targetfile-$targetdate.log"
    }
    

    Although I am not sure this will improve overall performance. Testing this on a 5MB file took about 100 seconds.