Search code examples
c#tar

How do you pack a large file into a .tar?


How do you pack a large file into a Tar?

If you use this code, then a lot of RAM is used.

byte[] buffer = ArrayPool<byte>.Shared.Rent(1024 * 1024);
string item = @"R:\Bigfile.dat";
using (FileStream _writestream = File.Create(@"R:\test.tar")
using (TarWriter tarwriter = new(_writestream, TarEntryFormat.Pax, leaveOpen: false))
using (FileStream srcFile = new(item, FileMode.Open, FileAccess.Read))
{
    string fileName = item.Remove(0, mainDir.Length + 1).Replace('\\', '/');
    PaxTarEntry te = new(TarEntryType.RegularFile, fileName)
    {
        DataStream = new MemoryStream()
    };
    
    int currentBlockSize = 0;
    while ((currentBlockSize = srcFile.Read(buffer)) > 0)
    {
        te.DataStream.Write(buffer.AsSpan(0, currentBlockSize));
    }
    te.DataStream.Position = 0;
    tarwriter.WriteEntry(te);
}
ArrayPool<byte>.Shared.Return(buffer);

If you use this code, it will use a lot of memory. Yes, probably because it will be written to MemoryStream(). But what are the better options if need to write a VERY large file? I want to write to a .tar file instead of RAM, and I want can control the size of each iteration of the write cycle through the buffer size.


Solution

  • I think you just don't need to do the copying from the input stream in memory at all. Here's a short but complete program which builds a tar file from each command line argument:

    using System.Formats.Tar;
    
    using var output = File.Create("output.tar");
    using var tarWriter = new TarWriter(output, TarEntryFormat.Pax);
    
    foreach (var arg in args)
    {
        using var input = File.OpenRead(arg);
        var entry = new PaxTarEntry(TarEntryType.RegularFile, arg)
        {
            DataStream = input
        };
        tarWriter.WriteEntry(entry);
    }
    

    I've just used that code to create a 20GB tar file (from several 1-3GB files), and Task Manager only showed it using 3.6MB...