Search code examples
c#bytefilestream

Handling big file stream (read+write bytes)


The following code do :

  1. Read all bytes from an input file
  2. Keep only part of the file in outbytes
  3. Write the extracted bytes in outputfile
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);

But there is a limitation of ~2GB data for each step.

Edit: The extracted bytes size can also be greater than 2GB.

How could I handle big file ? What is the best way to proceed with good performances, regardless of size ?

Thx !


Solution

  • It is better to stream the data from one file to the other, only loading small parts of it into memory:

    public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
    {
        // Open the files as streams
        using (var inStream = File.OpenRead(inFile))
        using (var outStream = File.OpenWrite(outFile))
        {
            // seek to the start position
            inStream.Seek(startPosition, SeekOrigin.Begin);
    
            // Create a variable to track how much more to copy
            // and a buffer to temporarily store a section of the file
            long remaining = size;
            byte[] buffer = new byte[81920];
    
            do
            {
                // Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
                int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
                if (bytesRead == 0) { break; }
    
                // Write the buffered bytes to the output file
                outStream.Write(buffer, 0, bytesRead);
                remaining -= bytesRead;
            }
            while (remaining > 0);
        }
    }
    

    Usage:

    CopyFileSection(sourcefile, outfile, offset, size);
    

    This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.

    Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.