Search code examples
chaskellffi

Haskell FFI : Using data types in a C program


I wrote a library with Haskell and I would like it to be used in C programs. I have read some documentation about using the foreign export ccall command and the Foreign module.

I have seen some example such as this one but these examples use common C types like Int or Double.

In my library, I created some data types like :

data OrdSymb = SEQ
             | SLT
             | SGT

or recursives with a supplied type :

data MyType a =
        TypeDouble Double
      | TypeInt Int
      | TypeVar a 
      | TypeAdd (MyType a) (MyType a) 

But I didn't find how to use/export these types with the FFI.

How can I export my own data types to C and use them in the foreign declarations to export my functions ?


Solution

  • The short answer to your question is:

    • As has been suggested here and in the comments to your related question, start by fully designing your C API, including the C data representations and C functions you expect to be able to use from C code. This is not a trivial step. The design decisions you make here will influence the approach you take to exporting (AKA marshaling) these Haskell types out to C and back.
    • Use a "C-to-Haskell" helper to automate the nuts and bolts of sizing, alignment, and structure member access. Either hsc2hs or c2hs or both may be helpful. These tools aren't design for exporting Haskell functions to C, but they are still useful.
    • Expect to spend a lot of time reading up on the whole FFI subsystem, studying the generated output from the aforementioned tools, and writing lots of C glue. Any non-trivial language binding is complex, and the fact that you're trying to expose a library written in a "high level" language through bindings in a "low level" language makes it all the more challenging.

    Now, here's a (very, very) long answer to your question that will actually give you something to work with. First some needed imports:

    -- file: MyLib1.hs
    module MyLib1 where
    
    import Control.Exception.Base
    import Foreign
    import Foreign.C
    

    Binding for an Enum

    Let's start with your first data type, a simple sum type with 0-ary constructors:

    data OrdSymb = SEQ | SLT | SGT deriving (Show, Eq)
    

    To be concrete, let's assume we also have some symbols:

    newtype Symbol = Symbol String
    

    and we'd like to expose Haskell functions with the following signatures:

    compareSymb :: Symbol -> Symbol -> OrdSymb
    compareSymb (Symbol x) (Symbol y) =
      case compare x y of { EQ -> SEQ; LT -> SLT; GT -> SGT }
    
    checkSymb :: Symbol -> OrdSymb -> Symbol -> Bool
    checkSymb x ord y = compareSymb x y == ord
    

    Admittedly, checkSymb is stupid, but I wanted to show examples of functions that both produce OrdSymb results and accept OrdSymb arguments.

    Here's the C interface we'd like to have for these data types and functions. A natural C representation for a sum type with 0-ary constructors is an enum, so we get something like the following:

    enum ord_symb {
            SLT = -1,
            SEQ = 0,
            SGT = 1
    };
    

    Symbols can just be represented by pointers to NUL-terminated C strings:

    typedef char* symbol;
    

    and the signatures for the exported functions will look something like:

    enum ord_symb compare_symb(symbol x, symbol y);
    bool check_symb(symbol x, enum ord_symb ord, symbol y);
    

    Here's how to create the C language binding completely manually, with no C-to-Haskell helper. It's a little tedious, but seeing it will help you understand what's going on under the hood.

    We'll need an explicit mapping, for the OrdSymb type, between the Haskell constructor representation (SLT, SEQ, and SGT) and the C representation as an integer (-1, 0, or 1). You could do this with a couple of plain functions (e.g., toOrdSymb and fromOrdSymb), though Haskell provides an Enum class with some functions that fit this description:

    instance Enum OrdSymb where
    
      toEnum (-1) = SLT
      toEnum 0    = SEQ
      toEnum 1    = SGT
    
      fromEnum SLT = -1
      fromEnum SEQ = 0
      fromEnum SGT = 1
    

    For documentation purposes, it's also helpful to define a type to represent the C-side type enum ord_symb. The C standard says that enums have the same representation as ints, so we'll write:

    type C_OrdSymb = CInt
    

    Now, because OrdSymb is a simple type, it might make sense to create a Storable instance that can marshal its values to and from a C enum ord_symb in preallocated memory. That would look like this:

    instance Storable OrdSymb where
      sizeOf _ = sizeOf (undefined :: C_OrdSymb)
      alignment _ = alignment (undefined :: C_OrdSymb)
      peek ptr = genToEnum <$> peek (castPtr ptr :: Ptr C_OrdSymb)
      poke ptr val = poke (castPtr ptr :: Ptr C_OrdSymb) (genFromEnum val)
    

    where we've used the helper functions:

    genToEnum :: (Integral a, Enum b) => a -> b
    genToEnum = toEnum . fromIntegral
    
    genFromEnum :: (Integral a, Enum b) => b -> a
    genFromEnum = fromIntegral . fromEnum
    

    The peek and poke here just wrap the corresponding methods for plain CInts, and they use the toEnum and fromEnum methods defined above to perform the actual transformation.

    Note that this Storable instance isn't technically required. We can marshal OrdSymbs in and out of C enum ord_symbs without such an instance, and in fact in the examples below, that's what we'll do. However, the Storable could come in handy if we later have to work with a C structure that contains an enum ord_symb member, or if we find we're marshaling arrays of enum ord_symbs or something.

    However, it's worth bearing in mind that -- generally speaking -- objects that are marshaled to and from C don't need to be Storable, and making something Storable doesn't magically take care of all the details of marshaling. In particular, if we tried to write a Storable instance for Symbol, we'd run into trouble. Storables are supposed to be of predetermined length, so sizeOf isn't supposed to inspect it's argument. However, a Symbol's size depends on the underlying string, so unless we decide to implement a maximum string length and store all Symbols that way, we shouldn't use a Storable instance here. Instead, let's write some marshaling functions for Symbols without the benefit of the Storable class:

    peekSymbol :: Ptr Symbol -> IO Symbol
    peekSymbol ptr = Symbol <$> peekCString (castPtr ptr)
    
    newSymbol :: Symbol -> IO (Ptr Symbol)
    newSymbol (Symbol str) = castPtr <$> newCString str
    
    freeSymbol :: Ptr Symbol -> IO ()
    freeSymbol = free
    

    Note that we don't "poke" symbols, because we don't normally have a pre-allocated buffer of the correct size into which we're writing the symbol. Instead, when we want to marshal a Symbol out to C, we'll need to allocate a new C string for it, and that's what newSymbol does. To avoid a memory leak, we'll need to call freeSymbol (or just free) on symbols once we're done with them (or let the user of our C bindings know that they are responsible for calling the C function free on the pointer). This also means it may be helpful to write a helper that can be used to wrap a computation that uses a marshalled symbol without leaking the memory. Again, this is something we won't actually use in this example, but it's a helpful sort of thing to define:

    withSymbol :: Symbol -> (Ptr Symbol -> IO a) -> IO a
    withSymbol sym = bracket (newSymbol sym) freeSymbol
    

    Now, we can export our Haskell functions by writing wrappers that perform the marshaling:

    mylib_compare_symb :: Ptr Symbol -> Ptr Symbol -> IO C_OrdSymb
    mylib_compare_symb px py = do
      x <- peekSymbol px
      y <- peekSymbol py
      return $ genFromEnum (compareSymb x y)
    
    mylib_check_symb :: Ptr Symbol -> C_OrdSymb -> Ptr Symbol -> IO CInt
    mylib_check_symb px ord py = do
      x <- peekSymbol px
      y <- peekSymbol py
      return $ genFromEnum (checkSymb x (genToEnum ord) y)
    

    Note that the genFromEnum in the last line is for the Enum instance for Haskell's Bool type, to turn false/true into 0/1.

    Also, it's maybe worth noting that, for these wrappers, we didn't use any Storable instances at all!

    Finally, we can export the wrapper functions to C.

    foreign export ccall mylib_compare_symb
      :: Ptr Symbol -> Ptr Symbol -> IO C_OrdSymb
    foreign export ccall mylib_check_symb
      :: Ptr Symbol -> C_OrdSymb -> Ptr Symbol -> IO CInt
    

    If you put all of the above Haskell code into MyLib1.hs, create mylib.h, example1.c, and ffitypes.cabal with contents as follows:

    // file: mylib.h
    #ifndef MYLIB_H
    #define MYLIB_H
    
    enum ord_symb {
            SLT = -1,
            SEQ = 0,
            SGT = 1
    };
    typedef char* symbol;   // NUL-terminated string
    
    // don't need these signatures -- they'll be autogenerated into
    // MyLib1_stub.h
    //enum ord_symb compare_symb(symbol x, symbol y);
    //bool check_symb(symbol x, enum ord_symb ord, symbol y);
    
    #endif
    

    and:

    // file: example1.c
    #include <HsFFI.h>
    #include "MyLib1_stub.h"
    #include <stdio.h>
    
    #include "mylib.h"
    
    int main(int argc, char *argv[])
    {
        hs_init(&argc, &argv);
    
        symbol foo = "foo";
        symbol bar = "bar";
    
        printf("%s\n", mylib_compare_symb(foo, bar) == SGT ? "pass" : "fail");
        printf("%s\n", mylib_check_symb(foo, SGT, bar) ? "pass" : "fail");
        printf("%s\n", mylib_check_symb(foo, SEQ, bar) ? "fail" : "pass");
    
        hs_exit();
        return 0;
    }
    

    and:

    -- file: ffitypes.cabal
    name:                 ffitypes
    version:              0.1.0.0
    cabal-version:        >= 1.22
    build-type:           Simple
    
    executable example1
      main-is:            example1.c
      other-modules:      MyLib1
      include-dirs:       .
      includes:           mylib.h
      build-depends:      base
      default-language:   Haskell2010
      cc-options:         -Wall -O
      ghc-options:        -Wall -Wno-incomplete-patterns -O
    

    and put everything in a fresh ffitypes directory. Then, from that directory:

    $ stack init
    $ stack build
    $ stack exec example1
    

    should work to run the example.

    Marshaling a Parameterized, Recursive Type

    Now, let's turn to your more complicated MyType. I've changed the Int to an Int32 so it'll match a CInt on typical platforms.

    data MyType a =
            TypeDouble Double
          | TypeInt Int32
          | TypeVar a 
          | TypeAdd (MyType a) (MyType a)
    

    This is a sum type with unary and binary constructors, an arbitrary type parameter a, and recursive structure, so pretty complicated. Again, it's important to begin by specifying a concrete C implementation. A C union can be used to store a complicated sum type, but we'll also want to "tag" the union with a enum to indicate which constructor the union is representing, so the C type will look something like this:

    typedef struct mytype_s {
            enum mytype_cons_e {
                    TYPEDOUBLE,
                    TYPEINT,
                    TYPEVAR,
                    TYPEADD
            } mytype_cons;
            union {
                    double type_double;
                    int type_int;
                    void* type_var;
                    struct {
                            struct mytype_s *left;
                            struct mytype_s *right;
                    } type_add;
            } mytype_value;
    } mytype;
    

    Note that, to allow the C bindings to work with MyType as with multiple possible parameters a, we need to use a void* for the type_var union member.

    Writing the marshalling functions for MyType entirely manually is very painful and error prone. There are a lot of details about the exact size, alignment, and layout of C structures that you'd need to get right. Instead, we'll use the c2hs helper package. We'll start with a little preamble at the top of a new MyLib2.chs:

    -- file: MyLib2.chs
    module MyLib2 where
    
    import Foreign
    import Foreign.C
    
    #include "mylib.h"
    

    The c2hs package is great for working with enums. For example, creating marshalling infrastructure for the enum mytype_cons_e tag with this package looks like this:

    -- file: MyLib2.chs
    {#enum mytype_cons_e as MyTypeCons {}#}
    

    Note that this automatically retrieves the definition from the C header mylib.h, creates a Haskell definition equivalent to:

    -- data MyTypeCons = TYPEDOUBLE | TYPEINT | etc.
    

    and defines the needed Enum instance to map the Haskell constructors to and from the integer values on the C side. It'll be useful to have our generalized toEnum and fromEnum helpers here, too:

    genToEnum :: (Integral a, Enum b) => a -> b
    genToEnum = toEnum . fromIntegral
    
    genFromEnum :: (Integral a, Enum b) => b -> a
    genFromEnum = fromIntegral . fromEnum
    

    Now, let's look at marshalling your data type:

    data MyType a =
      TypeDouble Double
      | TypeInt Int32
      | TypeVar a 
      | TypeAdd (MyType a) (MyType a)
    

    to and from a struct mytype_s. One warning: these implementations assume that the recursive constructor TypeAdd and its C analogue type_add are never used to create "cycles" on the C or Haskell side. Handling data structures that are recursive in the sense that let x = 0:x is recursive would require a different approach.

    Because struct mytype_s is a fixed-length structure, you might think it would be a good candidate for a Storable instance, but that turns out not to be the case. Because of the embedded pointer in the type_var union member and the recursive pointers in the type_add member, it's not possible to write a reasonable Storable instance for MyType. We could write one for:

    data C_MyType a =
      C_TypeDouble Double
      | C_TypeInt Int32
      | C_TypeVar (Ptr a)
      | C_TypeAdd (Ptr (MyType a)) (Ptr (MyType a))
    

    where the pointers have been made explicit. When we marshall this, we'll assume we've already marshalled the "child" nodes and have pointers to them that we can marshall out to the structure. For the C_TypeAdd constructor, I could have written this instead:

      -- C_TypeAdd (Ptr (C_MyType a)) (Ptr (C_MyType a))
    

    It doesn't really matter, since we'll be freely casting Ptrs back and forth between MyTypes and C_MyTypes . I decided to use my definition because it got rid of two castPtr calls.

    The Storable instance for C_MyType looks like this. Note how c2hs allows us to look up sizes, alignments, and offsets automatically. We'd have to calculate these all manually otherwise.

    instance Storable (C_MyType a) where
      sizeOf _ = {#sizeof mytype_s#}
      alignment _ = {#alignof mytype_s#}
      peek p = do
        typ <- genToEnum <$> {#get struct mytype_s->mytype_cons#} p
        case typ of
          TYPEDOUBLE ->
            C_TypeDouble . (\(CDouble x) -> x)
            <$> {#get struct mytype_s->mytype_value.type_double#} p
          TYPEINT    ->
            C_TypeInt    . (\(CInt    x) -> x)
            <$> {#get struct mytype_s->mytype_value.type_int   #} p
          TYPEVAR    ->
            C_TypeVar . castPtr <$> {#get struct mytype_s->mytype_value.type_var#} p
          TYPEADD    -> do
            q1 <- {#get struct mytype_s->mytype_value.type_add.left#} p
            q2 <- {#get struct mytype_s->mytype_value.type_add.right#} p
            return $ C_TypeAdd (castPtr q1) (castPtr q2)
      poke p t = case t of
        C_TypeDouble x -> do
          tag TYPEDOUBLE
          {#set struct mytype_s->mytype_value.type_double#} p (CDouble x)
        C_TypeInt x    -> do
          tag TYPEINT
          {#set struct mytype_s->mytype_value.type_int   #} p (CInt    x)
        C_TypeVar q    -> do
          tag TYPEVAR
          {#set struct mytype_s->mytype_value.type_var   #} p (castPtr q)
        C_TypeAdd q1 q2 -> do
          tag TYPEADD
          {#set struct mytype_s->mytype_value.type_add.left #} p (castPtr q1)
          {#set struct mytype_s->mytype_value.type_add.right#} p (castPtr q2)
    
        where
          tag = {#set struct mytype_s->mytype_cons#} p . genFromEnum
    

    With the Storable instance for C_MyType out of the way, the marshalling functions for a real MyType look pretty clean:

    peekMyType :: (Ptr a -> IO a) -> Ptr (MyType a) -> IO (MyType a)
    peekMyType peekA p = do
      ct <- peek (castPtr p)
      case ct of
        C_TypeDouble x -> return $ TypeDouble x
        C_TypeInt    x -> return $ TypeInt    x
        C_TypeVar    q -> TypeVar <$> peekA q
        C_TypeAdd q1 q2 -> do
          t1 <- peekMyType peekA q1
          t2 <- peekMyType peekA q2
          return $ TypeAdd t1 t2
    
    newMyType :: (a -> IO (Ptr a)) -> MyType a -> IO (Ptr (MyType a))
    newMyType newA t = do
      p <- malloc
      case t of
        TypeDouble x  -> poke p (C_TypeDouble x)
        TypeInt    x  -> poke p (C_TypeInt    x)
        TypeVar    v  -> poke p . C_TypeVar =<< newA v
        TypeAdd t1 t2 -> do
          q1 <- newMyType newA t1
          q2 <- newMyType newA t2
          poke p (C_TypeAdd q1 q2)
      return (castPtr p)  -- case from Ptr C_MyType to Ptr MyType
    
    freeMyType :: (Ptr a -> IO ()) -> Ptr (MyType a) -> IO ()
    freeMyType freeA p = do
      ct <- peek (castPtr p)
      case ct of
        C_TypeVar q -> freeA q
        C_TypeAdd q1 q2 -> do
          freeMyType freeA q1
          freeMyType freeA q2
        _ -> return ()  -- no children to free
      free p
    

    Note how we need to use helpers for the a type. Whenever we want to make a newMyType for a MyType a, we'll need to provide a tailored newA for the a type. It would be possible to make this into a typeclass and even create an instance for all Storable a, but I haven't done that here.

    Now, suppose we have a Haskell function that uses all these data types that we'd like to export to C:

    replaceSymbols :: OrdSymb -> Symbol -> Symbol -> MyType Symbol -> MyType Symbol
    replaceSymbols ord sym1 sym2 = go
      where
        go (TypeVar s) | checkSymb s ord sym1 = TypeVar sym2
        go (TypeAdd t1 t2) = TypeAdd (go t1) (go t2)
        go rest = rest
    

    with the helper functions previously defined:

    compareSymb :: Symbol -> Symbol -> OrdSymb
    compareSymb (Symbol x) (Symbol y) =
      case compare x y of { EQ -> SEQ; LT -> SLT; GT -> SGT }
    
    checkSymb :: Symbol -> OrdSymb -> Symbol -> Bool
    checkSymb x ord y = compareSymb x y == ord
    

    We'll need a few other things in MyLib2.chs. First, we'll use c2hs to define the OrdSymb type (again, this automatically generates the associated data OrdSymb):

    {#enum ord_symb as OrdSymb {} deriving (Show, Eq)#}
    type C_OrdSymb = CInt
    

    and the symbol-marshalling code copied from MyLib1.hs:

    newtype Symbol = Symbol String
    
    peekSymbol :: Ptr Symbol -> IO Symbol
    peekSymbol ptr = Symbol <$> peekCString (castPtr ptr)
    
    newSymbol :: Symbol -> IO (Ptr Symbol)
    newSymbol (Symbol str) = castPtr <$> newCString str
    
    freeSymbol :: Ptr Symbol -> IO ()
    freeSymbol = free
    

    Then, we can write the following C wrapper:

    mylib_replace_symbols :: C_OrdSymb -> Ptr Symbol -> Ptr Symbol
        -> Ptr (MyType Symbol) -> IO (Ptr (MyType Symbol))
    mylib_replace_symbols ord psym1 psym2 pt = do
      sym1 <- peekSymbol psym1
      sym2 <- peekSymbol psym2
      t <- peekMyType peekSymbol pt
      let t' = replaceSymbols (genToEnum ord) sym1 sym2 t
      newMyType newSymbol t'
    

    Given that this returns a malloced data structure, it's helpful to also provide an exported function to free it:

    mylib_free_mytype_symbol :: Ptr (MyType Symbol) -> IO ()
    mylib_free_mytype_symbol = freeMyType freeSymbol
    

    And let's export them:

    foreign export ccall mylib_replace_symbols
      :: C_OrdSymb -> Ptr Symbol -> Ptr Symbol
           -> Ptr (MyType Symbol) -> IO (Ptr (MyType Symbol))
    foreign export ccall mylib_free_mytype_symbol
      :: Ptr (MyType Symbol) -> IO ()
    

    If you take all the Haskell code in this section, starting with the module MyLib2 line and put it in MyLib2.chs, then create/modify the following files:

    // file: mylib.h
    #ifndef MYLIB_H
    #define MYLIB_H
    
    enum ord_symb {
        SLT = -1,
        SEQ = 0,
        SGT = 1
    };
    typedef char* symbol;   // NUL-terminated string
    
    typedef struct mytype_s {
        enum mytype_cons_e {
            TYPEDOUBLE,
            TYPEINT,
            TYPEVAR,
            TYPEADD
        } mytype_cons;
        union {
            double type_double;
            int type_int;
            void* type_var;
            struct {
                struct mytype_s *left;
                struct mytype_s *right;
            } type_add;
        } mytype_value;
    } mytype;
    
    #endif
    

    and:

    // file: example2.c
    #include <HsFFI.h>
    #include "MyLib2_stub.h"
    #include <stdio.h>
    
    #include "mylib.h"
    
    // AST for:   1.0 + foo
    mytype node1 = { TYPEDOUBLE, {type_double: 1.0} };
    mytype node2 = { TYPEVAR,    {type_var: "foo"} };
    mytype root  = { TYPEADD,    {type_add: {&node1, &node2} } };
    
    int main(int argc, char *argv[])
    {
        hs_init(&argc, &argv);
    
        mytype *p1 = mylib_replace_symbols(SEQ, "foo", "bar", &root);
        printf("%s\n",  // should print "bar"
           (char*) p1->mytype_value.type_add.right->mytype_value.type_var);
        mytype *p2 = mylib_replace_symbols(SEQ, "quux", "bar", &root);
        printf("%s\n",  // unchanged -- should still be "foo"
           (char*) p2->mytype_value.type_add.right->mytype_value.type_var);
    
        mylib_free_mytype_symbol(p1);
        mylib_free_mytype_symbol(p2);
    
        hs_exit();
        return 0;
    }
    

    and add the executable example2 clause to your Cabal file:

    -- file: ffitypes.cabal
    name:                 ffitypes
    version:              0.1.0.0
    cabal-version:        >= 1.22
    build-type:           Simple
    
    executable example1
      main-is:            example1.c
      other-modules:      MyLib1
      include-dirs:       .
      includes:           mylib.h
      build-depends:      base
      default-language:   Haskell2010
      cc-options:         -Wall -O
      ghc-options:        -Wall -Wno-incomplete-patterns -O
    
    executable example2
      main-is:            example2.c
      other-modules:      MyLib2
      include-dirs:       .
      includes:           mylib.h
      build-depends:      base
      build-tools:        c2hs
      default-language:   Haskell2010
      cc-options:         -Wall -O
      ghc-options:        -Wall -Wno-incomplete-patterns -O
    

    and stick them all in the ffitypes directory, then you should be able to stack build and stack exec example2.

    Language Bindings are Hard!

    As you can perhaps tell from the code above, it takes a lot of work to create even simple C bindings for Haskell libraries. If it's any consolation, creating Haskell bindings for C libraries is only a little bit easier. Best of luck!