Search code examples
haskellvectorstorable

Defining Storable for Recursive Data Structure Involving Vectors


I have a data structure of the form below (V is Data.Storable.Vector):

data Elems = I {-# UNPACK #-} !GHC.Int.Int32
             | S {-# UNPACK #-} !GHC.Int.Int32 {-# UNPACK #-} !(Ptr CChar)
             | T {-# UNPACK #-} !(V.Vector Elems)
                deriving (Show)

I first wrote a custom storable definition for non-recursive form (i.e., without T constructor). Then, I tried to add custom peek and poke definition for T using ForeignPtr and length information from Vector (code is below). The GHC compiler complains about Storable instance not being defined for ForeignPtr Elems type. My question is if it is possible to store ptr to Vector in Storable definition, without being forced to write Storable instance definition for ForeignPtr.

From Haddocs documentation, ForeignPtr seems to be just a Ptr with a Finalizer assigned to it:

The essential difference between ForeignPtrs and vanilla memory references of type Ptr a is that the former may be associated with finalizers.

I don't want to work around the issue by using Ptr instead of ForeignPtr, because of issues of finalizing it. So, I prefer storing location of ForeignPtr (through Ptr (ForeignPtr a)) so that GHC garbage collector knows about the reference to it. But, that approach would force me to define a Storable instance (because of constraint (Storable a) => Ptr a which makes sense).

Is there a way to store and retrieve ptr to a Vector in Storable, without defining Storable instance for ForeignPtr? If there isn't, then writing the Storable definition of ForeignPtr is a must. In that case, what would it look like? My guess is it will just store a Ptr to ForeignPtr.

The full code below:

{-# LANGUAGE MagicHash #-}
import qualified Data.Vector.Storable as V
import Foreign
import Foreign.C.Types (CChar)
import Foreign.Marshal.Array (lengthArray0)
import GHC.Int

data Elems = I {-# UNPACK #-} !GHC.Int.Int32
             | S {-# UNPACK #-} !GHC.Int.Int32 {-# UNPACK #-} !(Ptr CChar)
             | T {-# UNPACK #-} !(V.Vector Elems)
                deriving (Show)

instance Storable Elems where
  sizeOf _ = sizeOf (undefined :: Word8) + sizeOf (undefined :: Int32) + sizeOf (undefined :: Ptr CChar)
  alignment _ = 4

  {-# INLINE peek #-}
  peek p = do
      let p1 = (castPtr p::Ptr Word8) `plusPtr` 1 -- get pointer to start of the element. First byte is type of element
      t <- peek (castPtr p::Ptr Word8)
      case t of
        1 -> do 
          x <- peek (castPtr p1 :: Ptr GHC.Int.Int32) 
          return (I x)
        2 -> do 
          x <- peek (castPtr p1 :: Ptr GHC.Int.Int32) 
          y <- peek (castPtr (p1 `plusPtr` 4) :: Ptr (Ptr CChar)) -- increment pointer by 4 bytes first
          return (S x y)
        _ -> do
          x <- peek (castPtr p1 :: Ptr Int)
          y <- peek (castPtr (p1 `plusPtr` 8) :: Ptr (ForeignPtr Elems)) 
          return (T (V.unsafeFromForeignPtr y 0 x)) -- return vector

  {-# INLINE poke #-}
  poke p x = case x of
      I a -> do
        poke (castPtr p :: Ptr Word8) 1  
        poke (castPtr p1) a
      S a b -> do
        poke (castPtr p :: Ptr Word8) 2
        poke (castPtr p1) a
        poke (castPtr (p1 `plusPtr` 4)) b -- increment pointer by 4 bytes first
      T x -> do
        poke (castPtr p :: Ptr Word8) 3
        let (fp,_,n) = V.unsafeToForeignPtr x
        poke (castPtr p1) n
        poke (castPtr (p1 `plusPtr` 8)) fp

      where  p1 = (castPtr p :: Ptr Word8) `plusPtr` 1 -- get pointer to start of the element. First byte is type of element

Solution

  • ForeignPtrs cannot be made Storable, because their implementation demands a way to associate one or many finalizer pointers to a raw pointer, and this association is runtime-dependent. To make the ForeignPtr storable, you need to store the associated Ptr (which is easy) and the array of associated finalizers (which is impossible, since the finalizers are runtime-internal, and might bind to the GC of the GHC runtime).

    This is not the problem that needs to be solved here, though.

    The problem is that there's no reasonable way to make something that contains a Vector into something Storable. A Vector demands managed memory for its contents (The definition of Storable.Vector is data Vector a = Vector Int (ForeignPtr a) plus some strictness annotations), but the whole purpose of Storable is to store some value into unmanaged memory. Further, a Vector uses different amounts of memory depending on its length, but Storable data structures must use a constant amount of memory.

    You need to rethink what your data structure is trying to model. Do you really need to store a Vector like this? Remember that you are storing a Vector of Elems, meaning that you can have a value T that contains a Vector that contains a T that contains a Vector that contains a T, etc.

    I think that you might be trying to model the following data structure instead, but I might be wrong:

    data Elems = OneElem Elem | ManyElems (Vector Elem)
    
    data Elem
        = I !GHC.Int.Int32
        | S !GHC.Int.Int32 !(Ptr CChar)
    

    If you really need the recursive data structure that you described, try to implement this instead:

    data Elems
        = I !GHC.Int.Int32
        | S !GHC.Int.Int32 !(Ptr CChar)
        | T !GHC.Int.Int32 !(Ptr Elems)
    

    A pointer to some Elems uses constant memory, and can point to unmanaged memory, so you can create storable instances for it.