Search code examples
powershellhashparallel-processingbinarywriter

PowerShell 7.0 how to compute hashsum of a big file read in chunks


The script should copy files and compute hash sum of them. My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load. So I tried read a chunk of file then compute md5 hash sum and write out file to several places. The file`s size may vary from 100 MB up to 2 TB and maybe more. There is no need to check files identity at this moment, just need to compute hash sum for initial files.

And I am stuck with respect to computing hash sum:

    $ifile = "C:\Users\User\Desktop\inputfile"
    $ofile = "C:\Users\User\Desktop\outputfile_1"
    $ofile2 = "C:\Users\User\Desktop\outputfile_2"
    
    $md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
    $bufferSize = 10mb
    $stream = [System.IO.File]::OpenRead($ifile)
    $makenew = [System.IO.File]::OpenWrite($ofile)
    $makenew2 = [System.IO.File]::OpenWrite($ofile2)
    $buffer = new-object Byte[] $bufferSize
    
    while ( $stream.Position -lt $stream.Length ) {
       
     $bytesRead = $stream.Read($buffer, 0, $bufferSize)
     $makenew.Write($buffer, 0, $bytesread) 
     $makenew2.Write($buffer, 0, $bytesread) 
    
     # I am stuck here
     $hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""      
            
            }
    
    $stream.Close()
    $makenew.Close()
    $makenew2.Close()

How I can collect chunks of data to compute the hash of whole file?

And extra question: is it possible to calculate hash and write data out in parallel mode? Especially taking into account that workflow {parallel{}} does not supported from PS version 6 ?

Many thanks


Solution

  • Final listing

    $ifile = "C:\Users\User\Desktop\inputfile"
    $ofile = "C:\Users\User\Desktop\outputfile_1"
    $ofile2 = "C:\Users\User\Desktop\outputfile_2"
    
    $md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
    $bufferSize = 1mb
    $stream = [System.IO.File]::OpenRead($ifile)
    $makenew = [System.IO.File]::OpenWrite($ofile)
    $makenew2 = [System.IO.File]::OpenWrite($ofile2)
    $buffer = new-object Byte[] $bufferSize
    
    while ( $stream.Position -lt $stream.Length ) 
    {
         $bytesRead = $stream.Read($buffer, 0, $bufferSize)
         $makenew.Write($buffer, 0, $bytesread) 
         $makenew2.Write($buffer, 0, $bytesread) 
        
         $hash = $md5.TransformBlock($buffer, 0 , $bytesRead, $null , 0)  
    } 
    
    $md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
    $hash = [BitConverter]::ToString($md5.Hash).Replace('-','')      
    $hash
    $stream.Flush()
    $stream.Close()
    $makenew.Flush()
    $makenew.Close()
    $makenew2.Flush()
    $makenew2.Close()