Search code examples
cpascalvmsopenvms

Read file fast (stream_lf format)


Looking for a way to speed up reading and processing a large text file (basically csv; stream_lf).

Should I bypass RMS? Solution may be asynchronous or synchronous.

Current implementation is synchronous, but is too slow.

Implementation is in HP Pascal, and using the pascal run-time library (OPEN/READLN/EOF/CLOSE). Bypassing the pascal run-time library is acceptable.

Examples may be in C or Pascal.


Solution

  • For system block was set to 32. I tried SET RMS/BLOCK=32/BUF=8. That already gave an improvement.

    [edit: If there is no process setting, then the system setting us used. So the test done added buffers, but did not make them bigger]

    32 is just 16KB. Great for 1992, lame for 2012. If more buffers already helped, then larger buffers is likely to help even more. The larger the better. Multiples of 8KB may help just a but extra. Thus try 128, and also try 255 at the SET RMS process level. If it brings happiness, then you may want to adapt the process to select its own RMS settings and not rely on DCL settings.

    The RMS $GET call will normally only get a single record, but you could 'lie' about the the file, with SET FIL/ATTR=(RFM=UDF) or perhaps (RFM=FIX,LRL=8192). You can do that temporarily in a program using SYS$MODIFY. After that you can read in big chunks but your program will need to decode the real records in the spoofed records. That will be much like using SYS$READ / SYS$QIOW (BlockIO) but sticking to record mode will give you free 'read ahead'. Yeah you can code that yourself with aysnc IO, but that's a hassle.

    Btw... don't go crazy on the number of buffers. In benchmarks (many years ago) I saw little or negative benefits with more than 10 or so. The reason is that RMS does 'read ahead' but not 'keep ahead'. It fills all buffers asynchroneously, but then posts no additional read as buffers get processed. Only when all data is consumed will is re-issue IOs for all buffers, instead ot trying to keep ahead as buffers are processed. Those 'waves' of IOs can confuse storage subsystem, and the first IO in the wave may be slowed down by the rest of the wave... so the program waits.

    How much data is in play? tens of megabytes or gigabytes> Will the XFC cache have a change to cache it between the exports and the processing?

    Met vriendelijke groetjes. Hein.