Search code examples
httpdownloadstreamsmalltalksqueak

Writing chunks of a large HTTP response to disk as soon as chunks arrive, in Squeak


I am trying to download files to disk from squeak. My method worked fine for small text/html files, but due to lack of buffering, it was very slow for the large binary file https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe. Also, after it finished, the file was much larger (113 MB) than shown on download page (75MB).

My code looks like this:

download: anURL 
    "download a file over HTTP and save it to disk under a name extracted from url."
    | ios name |
    name := ((anURL findTokens: '/') removeLast findTokens: '?') removeFirst.
    ios := FileStream oldFileNamed: name.
    ios  nextPutAll: ((HTTPClient httpGetDocument: anURL) content).
    ios close.
    Transcript show: 'done'; cr.

I have tried [bytes = stream next bufSize. bytes printTo: ios] for fixed size blocks in HTTP response's contentStream using a [stream atEnd] whileFalse: loop, but that garbled the output file with single quotes around each block, and also extra content after the blocks, which looked like all characters of the stream, each single quoted.

How can I implement buffered writing of an HTTP response to a disk file? Also, is there a way to do this in squeak while showing download progress?


Solution

  • The WebResponse class, when building the response content, creates a buffer large enough to hold the entire response, even for huge responses! I think this happens due to code in WebMessage>>#getContentWithProgress:.

    I tried to copy data from the input SocketStream of WebResponse directly to an output FileStream. I had to subclass WebClient and WebResponse, and write a two methods. Now the following code works as required.

    | client link |
    client := PkWebClient new.
    link := 'http://localhost:8000/racket-6.12-x86_64-linux.sh'.
    client download: link toFile: '/home/yo/test'.
    

    I have verified block by block update and integrity of the downloaded file.

    I include source below. The method streamContentDirectToFile: aFilePathString is the one that does things differently and solves the problem.

    WebClient subclass: #PkWebClient
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PK'!
    !PkWebClient commentStamp: 'pk 3/28/2018 20:16' prior: 0!
    Trying to download http directly to file.!
    
    
    !PkWebClient methodsFor: 'as yet unclassified' stamp: 'pk 3/29/2018 13:29'!
    download: urlString toFile: aFilePathString 
        "Try to download large files sensibly"
        | res |
        res := self httpGet: urlString.
        res := PkWebResponse new copySameFrom: res.
        res streamContentDirectToFile: aFilePathString! !
    
    
    WebResponse subclass: #PkWebResponse
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PK'!
    !PkWebResponse commentStamp: 'pk 3/28/2018 20:49' prior: 0!
    To make getContentwithProgress better.!
    ]style[(38)f1!
    
    
    !PkWebResponse methodsFor: 'as yet unclassified' stamp: 'pk 3/29/2018 13:20'!
    streamContentDirectToFile: aFilePathString 
        "stream response's content directly to file."
        | buffer ostream |
        stream binary.
        buffer := ByteArray new: 4096.
        ostream := FileStream oldFileNamed: aFilePathString.
        ostream binary.
        [stream atEnd]
            whileFalse: [buffer := stream nextInBuffer: 4096.
                stream receiveAvailableData.
                ostream nextPutAll: buffer].
        stream close.
        ostream close! !