I have a question and I can't find a reason for it.
I'm creating a custom archive file. I'm using MemoryStream
to store data and finally I use a FileStream
to write the data to disk.
My hard disk is an SSD, but the speed was too slow. When I tried to write only 95 MB to a file, it took 12 seconds to write!
I tried Filestream.Write
and File.WriteAllBytes
but it's the same.
At the end I got an idea to do it with copying and it was 100x faster!
I need to know why this is happening and what's wrong with the write functions.
Here's my code:
//// First of all I create an example 150MB file
Random randomgen = new Random();
byte[] new_byte_array = new byte[150000000];
randomgen.NextBytes(new_byte_array);
//// I turned the byte array into a MemoryStream
MemoryStream file1 = new MemoryStream(new_byte_array);
//// HERE I DO SOME THINGS WITH THE MEMORYSTREAM
/// Method 1 : File.WriteAllBytes | 13,944 ms
byte[] output = file1.ToArray();
File.WriteAllBytes("output.test", output);
// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
outfile.Write(output,0, output.Length);
// Method 3 | FileStream | 147 ms !!!! :|
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
file1.CopyTo(outfile);
Also, file1.ToArray()
only takes 90 ms to convert the MemoryStream to bytes.
Why is this happening and what is the reason and logic behind it?
Dmytro Mukalov has right. The performances you gain by extending FileStream
internal buffer will be taken away when you do actual Flush
. I dig a bit deeper and did some benchmark and it seems that the difference between Stream.CopyTo
and FileStream.Write
is that Stream.CopyTo
use I/O buffer smarter and boost performances by copying chunk by chunk. At the end CopyTo
use Write
under the hood. The optimum buffer size has been discussed here.
Optimum buffer size is related to a number of things: file system block size, CPU cache size, and cache latency. Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, you pay the price of the disk->RAM latency as well.
So to answer your question, in your case you are using unoptimized buffer sizes when using Write
and optimized when you are using CopyTo
or better to say Stream
itself will optimize that for you.
Generally, you could force also unoptimized CopyTo
by extending FileStream
internal buffer, in that case, the results should be comparaably slow as unoptimized Write
.
FileStream outfile = new FileStream("outputfile",
FileMode.Create,
FileAccess.ReadWrite,
FileShare.Read,
150000000); //internal buffer will lead to inefficient disk write
file1.CopyTo(outfile);
outfile.Flush(); //don't forget to flush data to disk
I did the analysis of the Write
methods of the FileStream
and MemoryStream
and the point there is that MemoryStream
always use an internal buffer to copy data, and it is extremely fast. The FileStream
itself has a switch if the requested count >= bufferSize
, which is true in your case as you are using default FileStream
buffer, the default buffer size is 4096
. In that case FileStream
doesn't use buffer at all but native Win32Native.WriteFile
.
The trick is to force FileStream
to use the buffer by overriding the default buffer size. Try this:
// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",
FileMode.Create,
FileAccess.ReadWrite,
FileShare.Read,
output.Length + 1); // important, the size of the buffer
outfile.Write(output, 0, output.Length);
n.b. I do not say it is optimal buffer size just an explanation what is going on. To examine the best buffer size using FileStream
refer to, link.