Search code examples
c#arraysbinaryformatter

Writing huge longs array to disk


I need to write huge arrays of longs (up to 5GB) to disk. I tried using BinaryFormatter but it seems to be able to write only arrays of size lower than 2GB:

long[] array = data.ToArray();
FileStream fs = new FileStream(dst, FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
try
{
    formatter.Serialize(fs, array);
}
catch (SerializationException e)
{
    Console.WriteLine("Failed to serialize. Reason: " + e.Message);
    throw;
}
finally
{
    fs.Close();
}

This code throws IndexOutOfRangeException for larger arrays.

I don't want to save element per element, because it takes too much time. Is there any proper way to save such large array?

Writing element per element:

using (BinaryWriter writer = new BinaryWriter(File.Open(dst, FileMode.Create)))
{
    foreach(long v in array)
    {
        writer.Write(v);
    }
} 

This is very slow.


Solution

  • OK, so maybe I got a little carried overboard with the MMF. Here's a simpler version, with a file stream only (I think this is what Scott Chamberlain suggested in the comments).

    Timings (on a new system) for a 3Gb array:

    1. MMF: ~50 seconds.
    2. FilStream: ~30 seconds.

    Code:

    long dataLen = 402653184; //3gb represented in 8 byte chunks
    long[] data = new long[dataLen];
    int elementSize = sizeof(long);
    
    Stopwatch sw = Stopwatch.StartNew();
    using (FileStream f = new FileStream(@"D:\Test.bin", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read, 32768))
    {
        int offset = 0;
        int workBufferSize = 32768;
        byte[] workBuffer = new byte[workBufferSize];
        while (offset < dataLen)
        {
            Buffer.BlockCopy(data, offset, workBuffer, 0, workBufferSize);
            f.Write(workBuffer, 0, workBufferSize);
    
            //advance in the source array
            offset += workBufferSize / elementSize;
        }
    }
    
    Console.WriteLine(sw.Elapsed);
    

    Old solution, MMF

    I think you can try with a MemoryMappedFile. I got ~2 to ~2.5 minutes for a 3Gb array on a relatively slower external drive.

    What this solution implies:

    1. First, create an empty file.
    2. Create a memory mapped file over it, with a default capacity of X bytes, where X is the array length in bytes. This automatically sets the physical length of the file, on disk, to that value.
    3. Dump the array to the file via a 32kx8 bytes wide accessor (you can change this, it's just something I tested with). So, I'm writing the array in chunks of 32k elements.

    Note that you will need to account for the case when the array length is not a multiple of chunkLength. For testing purposes, in my sample it is :).

    See below:

    //Just create an empty file
    FileStream f = File.Create(@"D:\Test.bin");
    f.Close();
    
    long dataLen = 402653184; //3gb represented in 8 byte chunks
    long[] data = new long[dataLen];
    int elementSize = sizeof (long);
    
    Stopwatch sw = Stopwatch.StartNew();
    
    //Open the file, with a default capacity. This allows you to write over the initial capacity of the file
    using (var mmf = MemoryMappedFile.CreateFromFile(@"D:\Test.bin", FileMode.Open, "longarray", data.LongLength * elementSize))
    {
        long offset = 0;
        int chunkLength = 32768; 
    
        while (offset < dataLen)
        {
            using (var accessor = mmf.CreateViewAccessor(offset * elementSize, chunkLength * elementSize))
            {
                for (long i = offset; i != offset + chunkLength; ++i)
                {
                    accessor.Write(i - offset, data[i]);
                }
            }
    
            offset += chunkLength;
        }
    }
    
    Console.WriteLine(sw.Elapsed);