Search code examples
haskellvectorbytestringstorable

Writing storable instance for CString with O(1) function to get total byte length


I am trying to write a storable vector instance for CString (null-terminated C chars in my case). The storable instance will store the pointers that the CString is (Ptr CChar). So, length of the vector is the number of CString pointers. Now, the reason I am writing this storable instance is because it will be used to do zero-copy from FFI CString, and then fast ByteString build using unsafeCreate (after some transformation - so, we use fast vectors here for intermediate operations). In order to do fast ByteString build, three things are needed for storable instance:

  • Total length in bytes - the storable instance needs to have book-keeeping allocation for storing the length of each CString when adding it to the vector, and total length of CString stored so far. Let us say total length of C Strings can't exceed 2^31. So, Int32/Word32 will do to store the length of each CString, and total length.
  • Function to store CString and its length - O(n) time. This function will walk the CString, and store its length, and also, increment the total length by the length of the CString.
  • Functon to return length in total bytes - O(1) time. This function will just retrieve value from the field that stores the total length

While I know how to write custom storable instance, I don't know how to handle this kind of case. A simple code (can be a simple toy example) that shows how to do custom bookkeeping, and write function to store/get bookkeeping results will be very much appreciated.

Update 1 (Clarification)

The reason for using storable vector instance in my case is two-fold: fast computation/transformation using unboxed types (on real-time data received through C FFI), and fast conversion to bytestring (to send out the data in real-time over IPC to another program). For fast bytestring conversion, unsafeCreate is excellent. But, we must know how much to allocate, and also, pass it a function for transformation. Given a storable vector instance (with mixed types - I simplified my question above to just CString type), it is easy for me to build a fast transform function that walks each element of the vector and transforms it to bytestring. Then, we simply pass it to unsafeCreate. But, we must also pass it number of bytes to allocate. A O(n) recursive byte length calculation function is too slow, and can double the overhead of building bytestring.


Solution

  • It sounds like you want to write something like this. Note that this code is not tested.

    -- The basic type.  Export the type but not the constructors or 
    -- accessors from the module.
    data StringVector {
       strVecLength :: Word32,               -- Total length
       strVecContents [(Word32, Ptr CChar)]  -- (Length, value) pairs
    }
    
    -- Invariants: forall (StringVector len contents), 
    --    len == sum (map fst) contents
    --    all (\p -> fst p == c_strlen (snd p)) contents
    
    
    -- The null case.
    emptyStrVec :: StringVector
    emptyStrVec = StringVector 0 []
    
    
    -- Put a new Cstring at the head of the vector.  Analogous to ":".
    stringVectorCons :: Ptr CChar -> StringVector -> StringVector
    stringVectorCons ptr (StringVector len pairs) = 
       StringVector (len + n) $ (n, ptr) : pairs
       where
          n = c_strlen ptr   -- Or whatever the right function name is
    
    
    -- Extract the head of the vector and the remaining vector.
    stringVectorUncons :: StringVector -> ((Word32, Ptr CChar), StringVector)
    stringVectorUncons (StringVector len (h:t)) =
       (h, StringVector (len - fst h) t)
    

    After that you can add whatever other functions you might want, depending on the application. Just make sure that each function preserves the invariants.