I need to read some bytes at known file offset in very big files (gigabytes). Now I am using this:
Using fsSrc As New FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, MAX_BUFFER)
For i = 0 To k.startas.Count - 1
currentpos = fsSrc.Seek(k.startas(i), SeekOrigin.Begin)
currentpos = fsSrc.Read(buferis, 0, 32)
k.ilgis(i) = buferis(8) + buferis(9) * 256 + &H16
Next
End Using
This portion is very slow, constant MAX_BUFFER is quite big, but this this is slow operation. Even finding and indexing of the file is faster- there I read megabyte chunks of file and search inside for indexes. But opening file and seeking to position is very slow. It is not random read- I sort "k" database that all reading points are in a row and no need to "rewind" the file position. There is another part in program working in the same manner- reading chunks of data with random length from known positions and writing to the other file. And it is very slow too. The algorithm is tha same- seek, read, write.
For those who wondering: 6.2Gb file - reading file in big chunks and indexing every byte for 7 byte sequence is 15s. Found 5890 entries. Meanwhile reading same file (must be part in cache) and getting 32 bytes from each entry is 115s. There are several types of indexes, up to hundreds of thousands of entries. And the slowest part is seek-read-(and write).
This is stats for 6Gb file indexing (there are several index and only one seek and read). But the main CPU load is "seek". For indexing, it is possible to make multiple threads to speed up. But how about "seek" and "read byte"?
Problem solved! Thanks to comments- MAX_BUFFER must be very small here.
You need to tune the buffer size to the task at hand.
I usually start at 32768 bytes* and see if increasing or decreasing that improves the performance.
At a guess, using larger buffers could slow performance if the buffer memory needs to be zeroed for each call but the buffer isn't actually used/needed.
* Because I think I once read somewhere that the buffer size that's used in Windows in 16384 bytes and I often find 32768 helps a little. Even if that's out of date or inaccurate, it seems to be a good starting point.