I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7. Due to corporate restrictions, I cannot run any command that is not built-in. The problem is that all solutions I found appear to read the whole file, so they are extremely slow.
Can this be accomplished, fast?
Notes:
Solutions here Unix tail equivalent command in Windows Powershell did not work.
Using -wait
does not make it fast. I do not have -tail
(and I do not know if it will work fast).
PS: There are quite a few related questions for head
and tail
, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,
Windows equivalent of the 'tail' command
CMD.EXE batch script to display last 10 lines from a txt file
Extract N lines from file using single windows command
powershell to get the first x MB of a file
https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command
How about this (reads last 8 bytes for demo):
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
$fs.ReadByte()
}
UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):
$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)
UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:
$M = 3
$fpath = "C:\10GBfile.dat"
$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size
$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
$fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
$fs.Read($buffer, 0, $buffer_size) | Out-Null
$result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()
($result -split $seq) | Select -Last $M
Try playing with bigger $buffer_size
- this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n
or just \n
.
This is very dirty code without any error handling and optimizations.