I am working with large files and writing directly to disk is slow. Because the file is large I cannot load it in a TMemoryStream.
TFileStream is not buffered so I want to know if there is a custom library that can offer buffered streams or should I rely only on the buffering offered by OS. Is the OS buffering reliable? I mean if the cache is full an old file (mine) might be flushed from cache in order to make room for a new file.
My file is in the GB range. It contains millions of records. Unfortunately, the records are not of fix size. So, I have to do millions of readings (between 4 and 500 bytes). The reading (and the writing) is sequential. I don't jump up and down into the file (which I think is ideal for buffering).
In the end, I have to write such file back to disk (again millions of small writes).
David provided the his personal library that provides buffered disk access.
Speed tests:
Input file: 317MB.SFF
Delphi stream: 9.84sec
David's stream: 2.05sec
______________________________________
More tests:
Input file: input2_700MB.txt
Lines: 19 millions
Compiler optimization: ON
I/O check: On
FastMM: release mode
**HDD**
Reading: **linear** (ReadLine) (PS: multiply time with 10)
We see clear performance drop at 8KB. Recommended 16 or 32KB
Time: 618 ms Cache size: 64KB.
Time: 622 ms Cache size: 128KB.
Time: 622 ms Cache size: 24KB.
Time: 622 ms Cache size: 32KB.
Time: 622 ms Cache size: 64KB.
Time: 624 ms Cache size: 256KB.
Time: 625 ms Cache size: 18KB.
Time: 626 ms Cache size: 26KB.
Time: 626 ms Cache size: 1024KB.
Time: 626 ms Cache size: 16KB.
Time: 628 ms Cache size: 42KB.
Time: 644 ms Cache size: 8KB. <--- no difference until 8K
Time: 664 ms Cache size: 4KB.
Time: 705 ms Cache size: 2KB.
Time: 791 ms Cache size: 1KB.
Time: 795 ms Cache size: 1KB.
**SSD**
We see a small improvement as we go towards higher buffers. Recommended 16 or 32KB
Time: 610 ms Cache size: 128KB.
Time: 611 ms Cache size: 256KB.
Time: 614 ms Cache size: 32KB.
Time: 623 ms Cache size: 16KB.
Time: 625 ms Cache size: 66KB.
Time: 639 ms Cache size: 8KB. <--- definitively not good with 8K
Time: 660 ms Cache size: 4KB.
______
Reading: **Random** (ReadInteger) (100000 reads)
SSD
Time: 064 ms. Cache size: 1KB. Count: 100000. RAM: 13.27 MB <-- probably the best buffer size for ReadInteger is 4bytes!
Time: 067 ms. Cache size: 2KB. Count: 100000. RAM: 13.27 MB
Time: 080 ms. Cache size: 4KB. Count: 100000. RAM: 13.27 MB
Time: 098 ms. Cache size: 8KB. Count: 100000. RAM: 13.27 MB
Time: 140 ms. Cache size: 16KB. Count: 100000. RAM: 13.27 MB
Time: 213 ms. Cache size: 32KB. Count: 100000. RAM: 13.27 MB
Time: 360 ms. Cache size: 64KB. Count: 100000. RAM: 13.27 MB
Conclusion: don't use it for "random" reading
Update 2020:
When reading sequentially, the new System.Classes.TBufferedFileStream seems to be 70% faster than the library presented above.
For everybody's interest: Embarcadero added TBufferedFileStream
(see the documentation) in the latest Release of Delphi 10.1 Berlin.
Unfortunately, I can't say how it competes with the solutions given here as I haven't bought the update yet. I am also aware of that the question was asked on Delphi 7 but I am sure the reference to Delphi's own implementation can be useful in the future.