Search code examples
common-lisp

Convert a flexi-stream containing a vector of octets to a utf-8 string in common lisp


I use the following code to convert the content of a flexi-stream to an utf-8 string:

(babel:octets-to-string  (slot-value (getf request :raw-body) 'vector)
                                                 :encoding :utf-8)

Inspecting :raw-body in the request plist shows the following object.

#<FLEXI-STREAMS::VECTOR-INPUT-STREAM {7007F67D63}>
--------------------
Class: #<STANDARD-CLASS FLEXI-STREAMS::VECTOR-INPUT-STREAM>
--------------------
All Slots:
[ ]  END         = 169
[ ]  INDEX       = 0
[ ]  OPEN-P      = T
[ ]  TRANSFORMER = NIL
[ ]  VECTOR      = @6=#(112 114 111 99 101 115 115..)

It looks odd/wrong to me to extract the vector slot from the stream and convert it, instead of reading sequentially from the stream.

Am I doing it "wrong"? If yes why? How does one access streams idiomatically in Common Lisp?


Solution

  • One problem with using the slot value like this is that it doesn't consume the contents of the input stream.

    Flexi-streams and friends aren't standard Common Lisp, but you can use read-byte and read-sequence with them.

    The simplest solution is probably to use a library function like read-stream-content-into-byte-vector from Alexandria:

    CL-USER> (babel:octets-to-string
              (alexandria:read-stream-content-into-byte-vector (getf request :raw-body)))
    

    You could write your own function to get a string from a stream using read-byte. Here is a simple function that just reads a byte at a time from the input stream and shoves it into a vector until there are no more bytes to read. This is not written for performance, but rather to get the thing done:

    (defun stream->string (stream &key (initial-size 1024))
      (do ((buffer (make-array initial-size
                               :element-type '(unsigned-byte 8)
                               :adjustable t
                               :fill-pointer 0))
           (b (read-byte stream nil)))
          ((null b) (return (babel:octets-to-string buffer)))
        (vector-push-extend b buffer)
        (setf b (read-byte stream nil))))
    

    A better and somewhat more involved solution could use read-sequence to read larger blocks of bytes at a time. Here is one way to write that; I would expect this to have better performance than the version above that uses read-byte, but I haven't tested that:

    (defun stream->string (stream &key (initial-size 1024))
      (do* ((buffer (make-array initial-size
                                :element-type '(unsigned-byte 8)
                                :adjustable t
                                :fill-pointer t))
            (buffer-size initial-size)
            (next (read-sequence buffer stream)))
           ((< next buffer-size)
            (setf (fill-pointer buffer) next)
            (return (babel:octets-to-string buffer)))
        (setf buffer-size (* 2 buffer-size))
        (adjust-array buffer buffer-size :fill-pointer t)
        (setf next (read-sequence buffer stream :start next))))
    

    This last version reads bytes into an array using read-sequence. If the array is filled during this operation it is resized by doubling its capacity. next keeps track of the next position to be written to in the array. When this value is smaller than the current buffer-size the contents of the stream were written into the buffer array without filling it, so the stream has been exhausted. The fill pointer is then set to the end of the buffer contents before passing the buffer to babel:octets-to-string.