Search code examples
windowspowershellbatch-filetail

Get last n lines or bytes of a huge file in Windows (like Unix's tail). Avoid time consuming options


I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7. Due to corporate restrictions, I cannot run any command that is not built-in. The problem is that all solutions I found appear to read the whole file, so they are extremely slow.

Can this be accomplished, fast?

Notes:

  1. I managed to get the first n lines, fast.
  2. It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864 for the first n bytes).

Solutions here Unix tail equivalent command in Windows Powershell did not work. Using -wait does not make it fast. I do not have -tail (and I do not know if it will work fast).

PS: There are quite a few related questions for head and tail, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,

Windows equivalent of the 'tail' command

CMD.EXE batch script to display last 10 lines from a txt file

Extract N lines from file using single windows command

https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent

powershell to get the first x MB of a file

https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command


Solution

  • How about this (reads last 8 bytes for demo):

    $fpath = "C:\10GBfile.dat"
    $fs = [IO.File]::OpenRead($fpath)
    $fs.Seek(-8, 'End') | Out-Null
    for ($i = 0; $i -lt 8; $i++)
    {
        $fs.ReadByte()
    }
    

    UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):

    $N = 8
    $fpath = "C:\10GBfile.dat"
    $fs = [IO.File]::OpenRead($fpath)
    $fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
    $buffer = new-object Byte[] $N
    $fs.Read($buffer, 0, $N) | Out-Null
    $fs.Close()
    [System.Text.Encoding]::UTF8.GetString($buffer)
    

    UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:

    $M = 3
    $fpath = "C:\10GBfile.dat"
    
    $result = ""
    $seq = "`r`n"
    $buffer_size = 10
    $buffer = new-object Byte[] $buffer_size
    
    $fs = [IO.File]::OpenRead($fpath)
    while (([regex]::Matches($result, $seq)).Count -lt $M)
    {
        $fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
        $fs.Read($buffer, 0, $buffer_size) | Out-Null
        $result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
    }
    $fs.Close()
    
    ($result -split $seq) | Select -Last $M
    

    Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n or just \n. This is very dirty code without any error handling and optimizations.