Search code examples
lispcommon-lispcffi

How to build a byte buffer in CFFI?


I am retrieving a huge string over HTTP via Curl. Because of this, I can’t call cffi:foreign-string-to-lisp in the curl write callback, since not all data of the string is available at the moment when the callback is called – and calling cffi:foreign-string-to-lisp for only parts of the string yields an "invalid UTF-8" error.

Therefore I need to somehow copy the data received as a CFFI pointer to some buffer; with C this would be easy, just use realloc. But how can I have some sort of growing buffer in Lisp with CFFI?

(cffi:defcallback cb-write :unsigned-int ((ptr :pointer) (size :unsigned-int)
                      (nmemb :unsigned-int) (stream :pointer))
  (let* ((data-size (* size nmemb))
     
    ;; Copy the data from the pointer "ptr" somewhere – it is "data-size" bytes long.
    
    data-size))

Solution

  • The size of buffer (that is probably unsigned-byte vector, so an array) can be changed in CL with ADJUST-ARRAY. If the array was declared adjustable (with :adjustable t), then this function returns the original array and adjusts it, otherwise it may return a new array. If only new-dimensions and an initial-element argument are supplied, the content of the array is "kept".

    The adjust-array might be called "automatically" when needed if your vector is fillable (with :fill-pointer t) and you add elements using VECTOR-PUSH-EXTEND.

    This should answer question as directly asked. In the presented context I would probably want to process data as we go to prevent creating huge binary data (assuming they can be gigabytes), but I have no good solution for this with CFFI and UTF-8 conversion functions I know about (maybe creating gray stream from binary data and processing it as stream source, but it is overly complicated). So in the end, I believe using native HTTP client implementation, or even as Kaz suggested reading output of curl process, might be easier and in fact even faster.

    If you know that the data are (for example) just ASCII and not UTF-8 you can skip all these problems by changing default encoding.

    Incidentally, I would say you found bug in CFFI documentation where they boldly use foreign-string-to-lisp in this context, so you may try to discuss with the authors - but maybe they just strived for simplicity.