Search code examples
sql-serverpowershellblobbcp

Extracting a large blob from SQL Server to a file takes a very long time in PowerShell


I've been researching a way to automate extraction of blob columns to files, and this blog details a few ways to do it.

Via BCP, larger files extract from my database very quickly. I'm able to extract a 2 gigabyte file in under 20 seconds. Here's the sample command line I used, based on the examples in the blog:

BCP "SELECT PictureData FROM BLOB_Test.dbo.PicturesTest " QUERYOUT C:\BLOBTest\BlobOut\WC.jpg -T -f "C:\BLOBTest\FormatFile\BLOB.fmt" -S <ServerName>\<InstanceName>

As an aside, I had to learn about applying a format file to prevent a prefix string from being inserted into the file. This format file must be in BCP's older format, because the newer XML version of the format file has a schema entry for "PREFIX_LENGTH" that prevents a 0 value.

I would rather use PowerShell to extract the blobs, but the following code based on a TechNet article takes about two days to process instead of BCP's 20 seconds for the same 2 gig blob.

## https://social.technet.microsoft.com/wiki/contents/articles/890.export-sql-server-blob-data-with-powershell.aspx
## Export of "larger" SQL Server blob to file with GetBytes-Stream

# Configuration data
   $Server     = ".\<Instance>";         # SQL Server Instance
   $Database   = "Blob_Test";            # Name of database
   $Dest       = "C:\BLOBTest\BLOBOut\"; # Path to export to
   $bufferSize = 8192;                   # Stream buffer size in bytes

# Select-Statement for name & blob with filter
   $Sql = "Select 
              [PictureName],
              [PictureData]
           From 
              dbo.PicturesTest";

# Open ADO.NET Connection
   $con = New-Object Data.SqlClient.SqlConnection;
   $con.ConnectionString = "Data Source=$Server;" +
                           "Integrated Security=True;" +
                           "Initial Catalog=$Database";
   $con.Open();

# New Command and Reader
   $cmd = New-Object Data.SqlClient.SqlCommand $Sql, $con;
   $rd  = $cmd.ExecuteReader();

# Create a byte array for the stream
   $out = [array]::CreateInstance('Byte', $bufferSize)

# Loop through records
   While ($rd.Read()) {
      Write-Output ("Exporting: {0}" -f $rd.GetString(0));
      
      # New BinaryWriter
         $fs = New-Object System.IO.FileStream ($Dest + $rd.GetString(0)), Create, Write;
         $bw = New-Object System.IO.BinaryWriter $fs;
      
         $start = 0;

      # Read first byte stream
         $received = $rd.GetBytes(1, $start, $out, 0, $bufferSize - 1);
      
      While ($received -gt 0) {
         $bw.Write($out, 0, $received);
         $bw.Flush();
         $start += $received;
         
         # Read next byte stream
            $received = $rd.GetBytes(1, $start, $out, 0, $bufferSize - 1);
      }
      
      $bw.Close();
      $fs.Close();
   }

# Closing & disposing all objects
   $fs.Dispose();
   $rd.Close();
   $cmd.Dispose();
   $con.Close();

Write-Output ("Finished");

It does eventually finish, but I don't know why the script takes so long to complete.

Does anyone have an idea why the PowerShell script is being neutered?


Solution

  • You don't need the BinaryWriter at all. That class is only meant to write primitive types like integers, doubles, strings etc in a .NET-specific format. It's rarely used.

    If you want to write bytes to a file all you need is to use Stream.Write :

    $fs.Write $received
    

    A better idea that can eliminate almost all the code is to use DbDataReader.GetStream instead of GetBytes to read the BLOB as a stream. After that you can use Stream.CopyTo to write the stream's contents to another stream:

    $dbFs=$rd.GetStream(1); 
    $dbFs.CopyTo($fs);