Search code examples
powershellfile-iofile-copyingteepowershell-5.1

How can I use tar and tee in PowerShell to do a read once, write many, raw file copy


I'm using a small laptop to copy video files on location to multiple memory sticks (~8GB). The copy has to be done without supervision once it's started and has to be fast.

I've identified a serious boundary to the speed, that when making several copies (eg 4 sticks, from 2 cameras, ie 8 transfers * 8Gb ) the multiple Reads use a lot of bandwidth, especially since the cameras are USB2.0 interface (two ports) and have limited capacity.

If I had unix I could use tar -cf - | tee tar -xf /stick1 | tee tar -xf /stick2 etc which means I'd only have to pull 1 copy (2*8Gb) from each camera once, on the USB2.0 interface.

The memory sticks are generally on a hub on the single USB3.0 interface that is driven on different channel so write sufficently fast.

For reasons, I'm stuck using the current Win10 PowerShell.

I'm currently writing the whole command to a string (concatenating the various sources and the various targets) and then using Invoke-Process to execute the copy process while I'm entertaining and buying the rounds in the pub after the shoot. (hence the necessity to be afk).

I can tar cf - | tar xf a single file, but can't seem to get the tee functioning correctly.

I can also successfully use the microSD slot to do a single cameras card which is not as physically nice but is fast on one cameras recording, but I still have the bandwidth issue on the remaining camera(s). We may end up with 4-5 source cameras at the same time which means the read once, write many, is still going to be an issue.

Edit: I've just advanced to play with Get-Content -raw | tee \stick1\f1 | tee \stick2\f1 | out-null . Haven't done timings or file verification yet....

Edit2: It seems like the Get-Content -raw works properly, but the functionality of PowerShell pipelines violates two of the fundamental Commandments of programming: A program shall do one thing and do it well, Thou shalt not mess with the data stream. For some unknown reason PowerShell default (and only) pipeline behaviour always modifies the datastream it is supposed to transfer from one stream to the next. Doesn't seem to have a -raw option nor does it seem to have a $session or $global I can set to remedy the mutilation.

How do PowerShell people transfer raw binary from one stream out, into the next process?


Solution

  • May be not quite what you want (if you insist on using built in Powershell commands), but if you care about speed, use streams and asynchronous Read/Write. Powershell is a great tool because it can use any .NET classes seamlessly.

    The script below can easily be extended to write to more than 2 destinations and can potentially handle arbitrary streams. You might want to add some error handling via try/catch there too. You may also try to play with buffered streams with various buffer size to optimize the code.

    Some references:

    -- 2021-12-09 update: Code is modified a little to reflect suggestions from comments.

    # $InputPath, $Output1Path, $Output2Path are parameters
    [Threading.CancellationTokenSource] $cancellationTokenSource = [Threading.CancellationTokenSource]::new()
    [Threading.CancellationToken] $cancellationToken = $cancellationTokenSource.Token
    
    [int] $bufferSize = 64*1024
    
    $fileStreamIn = [IO.FileStream]::new($inputPath,[IO.FileMode]::Open,[IO.FileAccess]::Read,[IO.FileShare]::None,$bufferSize,[IO.FileOptions]::SequentialScan)
    $fileStreamOut1 = [IO.FileStream]::new($output1Path,[IO.FileMode]::CreateNew,[IO.FileAccess]::Write,[IO.FileShare]::None,$bufferSize)
    $fileStreamOut2 = [IO.FileStream]::new($output2Path,[IO.FileMode]::CreateNew,[IO.FileAccess]::Write,[IO.FileShare]::None,$bufferSize)
    
    try{
        [Byte[]] $bufferToWriteFrom = [byte[]]::new($bufferSize)
        [Byte[]] $bufferToReadTo = [byte[]]::new($bufferSize)
        $Time = [System.Diagnostics.Stopwatch]::StartNew()
    
        $bytesRead = $fileStreamIn.read($bufferToReadTo,0,$bufferSize)
    
        while ($bytesRead -gt 0){
            $bufferToWriteFrom,$bufferToReadTo = $bufferToReadTo,$bufferToWriteFrom    
            $writeTask1 = $fileStreamOut1.WriteAsync($bufferToWriteFrom,0,$bytesRead,$cancellationToken)
            $writeTask2 = $fileStreamOut2.WriteAsync($bufferToWriteFrom,0,$bytesRead,$cancellationToken)
            $readTask = $fileStreamIn.ReadAsync($bufferToReadTo,0,$bufferSize,$cancellationToken)
            $writeTask1.Wait()
            $writeTask2.Wait()
            $bytesRead = $readTask.GetAwaiter().GetResult()    
        }
        $time.Elapsed.TotalSeconds
    }
    catch {
        throw $_
    }
    finally{
        $fileStreamIn.Close()
        $fileStreamOut1.Close()
        $fileStreamOut2.Close()
    }