Search code examples
smalltalksmalltalkx

Optimal stream size (ReadStream, WriteStream, etc.)


I'm now writing a program which generates a file. I wondered what are the best practices on the Stream(s) especially when it comes to size? I can imagine that if a stream gets too large it can bring some slowdowns or other performance issues.

I have the following code, which could be called many many times, alse the collection can be huge. I presume one should behave differently for different sizes like <1MB <=> 10MB <=> 100MB <=> to 1-10GB <=> >10GB

writeIntoStream: anInputStringCollection 

aWriteStream := WriteStream on: '' asUnicode16String.
anInputStringCollection do: [ :string |
    aWriteStream nextPutAllUnicode: string asUnicode16String.
].

^ aWriteStream

What are the best practices? For example, should one care if it fits to a heap or a stack?

For now I've concluded that if I use a maximum of 5kB for a stream (or collection) it is fast enough and it works (for Smalltalk/X).

I would like to know the limits and the internals for different Smalltalk flavours. (I did not perform any test and could not find any articles about it)

Edit: First thank you everyone (@LeandroCaniglia, @JayK, @aka.nice). The very first version was - the slowdowns were caused by way to many operations: open, write, close. Writen line by line:

write: newString to: aFile
    "Writes keyName, keyValue to a file"

    "/ aFile is UTF16-LE (Little Endian) Without Signature (BOM)
    aFile appendingFileDo: [ :stream | 
        stream nextPutAllUtf16Bytes: newString MSB: false
    ]

The second version, way faster but still not correct. There was an intermediary stream which was written in chunks was:

write: aWriteStream to: aFile
    "Writes everything written to the stream"

    "/ aFile is UTF16-LE Without Signature
    aFile appendingFileDo: [ :stream | "/ withoutTrailingSeparators must be there as Stream puts spaces at the end
        stream nextPutAllUtf16Bytes: (aWriteStream contents withoutTrailingSeparators) MSB: false
    ]

The third version after Leandro's anwer and you advice (I looked at the buffer - size is defined as __stringSize(aCollection) when available buffer/memory is exhausted, then it is written into file. I have removed #write:to: all together and now the stream is defined as:

anAppendFileStream := aFile appendingWriteStream.

Every method that takes play in the stream now uses:

anAppendFileStream nextPutUtf16Bytes: aCharacter MSB: false.

or

anAppendFileStream nextPutAllUtf16Bytes: string MSB: false

As for the buffer size itself:

There are buffer size logic where guessing of the buffer length takes places e.g.#nextPutAll: - bufLen = (sepLen == 1) ? len : (len + ((len/4) + 1) * sepLen);), where sepLen is defined based on separator size (EOF, cr, crlf).

There cen be different buffer sizes for different methods e.g. #copyToEndFrom: - for windows: bufferSize := 1 * 1024 or *nix bufferSize := 8 * 1024 [kB].


Solution

  • You are asking for best practices, so in that regard I would say that the best practice is to dump data on streams regardless of whether the particular stream is associated to a file or not. In your case, this means that you shouldn't be using an intermediate stream before getting to the real one on disk.

    Now, given the performance issue you encountered, my recommendation would be to better understand the cause of it as opposed to finding a workaround, as you are trying to do.

    In the case of streams, the main reason for a nextPutAll: operation to perform poorly is that the particular flavor of the particular message, nextPutAllUnicode: in your case, is not taking advantage of the optimizations built into the specific stream class.

    More precisely, most streams optimize nextPutAll: (and friends) by dumping the data argument in one operation. This is usually much faster than the semantically equivalent iteration:

    data do: [:token | stream nextPut: token]
    

    which not only sends many more messages than the single operation optimization, it also exacerbates the time taken by FFI, etc.

    So, to give you a hint on a line of action, my suggestion would be to debug the code and see why nextPutAllUnicode: is not being optimized, and with that understanding change your code so that it will allow the single operation to happen.