Search code examples
c#file-iofilestreamsparse-matrix

How to create fast and efficient filestream writes on large sparse files


I have an application that writes large files in multiple segments. I use FileStream.Seek to position each wirte. It appears that when I call FileStream.Write at a deep position in a sparse file the write triggers a "backfill" operation (writeing 0s) on all preceding bytes which is slow.

Is there a more efficient way of handling this situation?

The below code demonstrates the problem. The initial write takes about 370 MS on my machine.

    public void WriteToStream()
    {
        DateTime dt;
        using (FileStream fs = File.Create("C:\\testfile.file"))
        {   
            fs.SetLength(1024 * 1024 * 100);
            fs.Seek(-1, SeekOrigin.End);
            dt = DateTime.Now;
            fs.WriteByte(255);              
        }

        Console.WriteLine(@"WRITE MS: " + DateTime.Now.Subtract(dt).TotalMilliseconds.ToString());
    }

Solution

  • NTFS does support Sparse Files, however there is no way to do it in .net without p/invoking some native methods.

    It is not very hard to mark a file as sparse, just know once a file is marked as a sparse file it can never be converted back in to a non sparse file except by coping the entire file in to a new non sparse file.

    Example useage

    class Program
    {
        [DllImport("Kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
        private static extern bool DeviceIoControl(
            SafeFileHandle hDevice,
            int dwIoControlCode,
            IntPtr InBuffer,
            int nInBufferSize,
            IntPtr OutBuffer,
            int nOutBufferSize,
            ref int pBytesReturned,
            [In] ref NativeOverlapped lpOverlapped
        );
    
        static void MarkAsSparseFile(SafeFileHandle fileHandle)
        {
            int bytesReturned = 0;
            NativeOverlapped lpOverlapped = new NativeOverlapped();
            bool result =
                DeviceIoControl(
                    fileHandle,
                    590020, //FSCTL_SET_SPARSE,
                    IntPtr.Zero,
                    0,
                    IntPtr.Zero,
                    0,
                    ref bytesReturned,
                    ref lpOverlapped);
            if(result == false)
                throw new Win32Exception();
        }
    
        static void Main()
        {
            //Use stopwatch when benchmarking, not DateTime
            Stopwatch stopwatch = new Stopwatch();
    
            stopwatch.Start();
            using (FileStream fs = File.Create(@"e:\Test\test.dat"))
            {
                MarkAsSparseFile(fs.SafeFileHandle);
    
                fs.SetLength(1024 * 1024 * 100);
                fs.Seek(-1, SeekOrigin.End);
                fs.WriteByte(255);
            }
            stopwatch.Stop();
    
            //Returns 2 for sparse files and 1127 for non sparse
            Console.WriteLine(@"WRITE MS: " + stopwatch.ElapsedMilliseconds); 
        }
    }
    

    Once a file has been marked as sparse it now behaves like you excepted it to behave in the comments too. You don't need to write a byte to mark a file to a set size.

    static void Main()
    {
        string filename = @"e:\Test\test.dat";
    
        using (FileStream fs = new FileStream(filename, FileMode.Create))
        {
            MarkAsSparseFile(fs.SafeFileHandle);
    
            fs.SetLength(1024 * 1024 * 25);
        }
    }
    

    enter image description here