I wrote a library with Haskell and I would like it to be used in C programs. I have read some documentation about using the foreign export ccall
command and the Foreign
module.
I have seen some example such as this one but these examples use common C types like Int
or Double
.
In my library, I created some data types like :
data OrdSymb = SEQ
| SLT
| SGT
or recursives with a supplied type :
data MyType a =
TypeDouble Double
| TypeInt Int
| TypeVar a
| TypeAdd (MyType a) (MyType a)
But I didn't find how to use/export these types with the FFI.
How can I export my own data types to C and use them in the foreign
declarations to export my functions ?
The short answer to your question is:
hsc2hs
or c2hs
or both may be helpful. These tools aren't design for exporting Haskell functions to C, but they are still useful.Now, here's a (very, very) long answer to your question that will actually give you something to work with. First some needed imports:
-- file: MyLib1.hs
module MyLib1 where
import Control.Exception.Base
import Foreign
import Foreign.C
Let's start with your first data type, a simple sum type with 0-ary constructors:
data OrdSymb = SEQ | SLT | SGT deriving (Show, Eq)
To be concrete, let's assume we also have some symbols:
newtype Symbol = Symbol String
and we'd like to expose Haskell functions with the following signatures:
compareSymb :: Symbol -> Symbol -> OrdSymb
compareSymb (Symbol x) (Symbol y) =
case compare x y of { EQ -> SEQ; LT -> SLT; GT -> SGT }
checkSymb :: Symbol -> OrdSymb -> Symbol -> Bool
checkSymb x ord y = compareSymb x y == ord
Admittedly, checkSymb
is stupid, but I wanted to show examples of functions that both produce OrdSymb
results and accept OrdSymb
arguments.
Here's the C interface we'd like to have for these data types and functions. A natural C representation for a sum type with 0-ary constructors is an enum, so we get something like the following:
enum ord_symb {
SLT = -1,
SEQ = 0,
SGT = 1
};
Symbols can just be represented by pointers to NUL-terminated C strings:
typedef char* symbol;
and the signatures for the exported functions will look something like:
enum ord_symb compare_symb(symbol x, symbol y);
bool check_symb(symbol x, enum ord_symb ord, symbol y);
Here's how to create the C language binding completely manually, with no C-to-Haskell helper. It's a little tedious, but seeing it will help you understand what's going on under the hood.
We'll need an explicit mapping, for the OrdSymb
type, between the Haskell constructor representation (SLT
, SEQ
, and SGT
) and the C representation as an integer (-1, 0, or 1). You could do this with a couple of plain functions (e.g., toOrdSymb
and fromOrdSymb
), though Haskell provides an Enum
class with some functions that fit this description:
instance Enum OrdSymb where
toEnum (-1) = SLT
toEnum 0 = SEQ
toEnum 1 = SGT
fromEnum SLT = -1
fromEnum SEQ = 0
fromEnum SGT = 1
For documentation purposes, it's also helpful to define a type to represent the C-side type enum ord_symb
. The C standard says that enums have the same representation as int
s, so we'll write:
type C_OrdSymb = CInt
Now, because OrdSymb
is a simple type, it might make sense to create a Storable
instance that can marshal its values to and from a C enum ord_symb
in preallocated memory. That would look like this:
instance Storable OrdSymb where
sizeOf _ = sizeOf (undefined :: C_OrdSymb)
alignment _ = alignment (undefined :: C_OrdSymb)
peek ptr = genToEnum <$> peek (castPtr ptr :: Ptr C_OrdSymb)
poke ptr val = poke (castPtr ptr :: Ptr C_OrdSymb) (genFromEnum val)
where we've used the helper functions:
genToEnum :: (Integral a, Enum b) => a -> b
genToEnum = toEnum . fromIntegral
genFromEnum :: (Integral a, Enum b) => b -> a
genFromEnum = fromIntegral . fromEnum
The peek
and poke
here just wrap the corresponding methods for plain CInt
s, and they use the toEnum
and fromEnum
methods defined above to perform the actual transformation.
Note that this Storable
instance isn't technically required. We can marshal OrdSymb
s in and out of C enum ord_symb
s without such an instance, and in fact in the examples below, that's what we'll do. However, the Storable
could come in handy if we later have to work with a C structure that contains an enum ord_symb
member, or if we find we're marshaling arrays of enum ord_symb
s or something.
However, it's worth bearing in mind that -- generally speaking -- objects that are marshaled to and from C don't need to be Storable
, and making something Storable
doesn't magically take care of all the details of marshaling. In particular, if we tried to write a Storable
instance for Symbol
, we'd run into trouble. Storable
s are supposed to be of predetermined length, so sizeOf
isn't supposed to inspect it's argument. However, a Symbol
's size depends on the underlying string, so unless we decide to implement a maximum string length and store all Symbol
s that way, we shouldn't use a Storable
instance here. Instead, let's write some marshaling functions for Symbol
s without the benefit of the Storable
class:
peekSymbol :: Ptr Symbol -> IO Symbol
peekSymbol ptr = Symbol <$> peekCString (castPtr ptr)
newSymbol :: Symbol -> IO (Ptr Symbol)
newSymbol (Symbol str) = castPtr <$> newCString str
freeSymbol :: Ptr Symbol -> IO ()
freeSymbol = free
Note that we don't "poke" symbols, because we don't normally have a pre-allocated buffer of the correct size into which we're writing the symbol. Instead, when we want to marshal a Symbol
out to C, we'll need to allocate a new C string for it, and that's what newSymbol
does. To avoid a memory leak, we'll need to call freeSymbol
(or just free
) on symbols once we're done with them (or let the user of our C bindings know that they are responsible for calling the C function free
on the pointer). This also means it may be helpful to write a helper that can be used to wrap a computation that uses a marshalled symbol without leaking the memory. Again, this is something we won't actually use in this example, but it's a helpful sort of thing to define:
withSymbol :: Symbol -> (Ptr Symbol -> IO a) -> IO a
withSymbol sym = bracket (newSymbol sym) freeSymbol
Now, we can export our Haskell functions by writing wrappers that perform the marshaling:
mylib_compare_symb :: Ptr Symbol -> Ptr Symbol -> IO C_OrdSymb
mylib_compare_symb px py = do
x <- peekSymbol px
y <- peekSymbol py
return $ genFromEnum (compareSymb x y)
mylib_check_symb :: Ptr Symbol -> C_OrdSymb -> Ptr Symbol -> IO CInt
mylib_check_symb px ord py = do
x <- peekSymbol px
y <- peekSymbol py
return $ genFromEnum (checkSymb x (genToEnum ord) y)
Note that the genFromEnum
in the last line is for the Enum
instance for Haskell's Bool
type, to turn false/true into 0/1.
Also, it's maybe worth noting that, for these wrappers, we didn't use any Storable
instances at all!
Finally, we can export the wrapper functions to C.
foreign export ccall mylib_compare_symb
:: Ptr Symbol -> Ptr Symbol -> IO C_OrdSymb
foreign export ccall mylib_check_symb
:: Ptr Symbol -> C_OrdSymb -> Ptr Symbol -> IO CInt
If you put all of the above Haskell code into MyLib1.hs
, create
mylib.h
, example1.c
, and ffitypes.cabal
with contents as follows:
// file: mylib.h
#ifndef MYLIB_H
#define MYLIB_H
enum ord_symb {
SLT = -1,
SEQ = 0,
SGT = 1
};
typedef char* symbol; // NUL-terminated string
// don't need these signatures -- they'll be autogenerated into
// MyLib1_stub.h
//enum ord_symb compare_symb(symbol x, symbol y);
//bool check_symb(symbol x, enum ord_symb ord, symbol y);
#endif
and:
// file: example1.c
#include <HsFFI.h>
#include "MyLib1_stub.h"
#include <stdio.h>
#include "mylib.h"
int main(int argc, char *argv[])
{
hs_init(&argc, &argv);
symbol foo = "foo";
symbol bar = "bar";
printf("%s\n", mylib_compare_symb(foo, bar) == SGT ? "pass" : "fail");
printf("%s\n", mylib_check_symb(foo, SGT, bar) ? "pass" : "fail");
printf("%s\n", mylib_check_symb(foo, SEQ, bar) ? "fail" : "pass");
hs_exit();
return 0;
}
and:
-- file: ffitypes.cabal
name: ffitypes
version: 0.1.0.0
cabal-version: >= 1.22
build-type: Simple
executable example1
main-is: example1.c
other-modules: MyLib1
include-dirs: .
includes: mylib.h
build-depends: base
default-language: Haskell2010
cc-options: -Wall -O
ghc-options: -Wall -Wno-incomplete-patterns -O
and put everything in a fresh ffitypes
directory. Then, from that directory:
$ stack init
$ stack build
$ stack exec example1
should work to run the example.
Now, let's turn to your more complicated MyType
. I've changed the Int
to an Int32
so it'll match a CInt
on typical platforms.
data MyType a =
TypeDouble Double
| TypeInt Int32
| TypeVar a
| TypeAdd (MyType a) (MyType a)
This is a sum type with unary and binary constructors, an arbitrary type parameter a
, and recursive structure, so pretty complicated. Again, it's important to begin by specifying a concrete C implementation. A C union can be used to store a complicated sum type, but we'll also want to "tag" the union with a enum to indicate which constructor the union is representing, so the C type will look something like this:
typedef struct mytype_s {
enum mytype_cons_e {
TYPEDOUBLE,
TYPEINT,
TYPEVAR,
TYPEADD
} mytype_cons;
union {
double type_double;
int type_int;
void* type_var;
struct {
struct mytype_s *left;
struct mytype_s *right;
} type_add;
} mytype_value;
} mytype;
Note that, to allow the C bindings to work with MyType a
s with multiple possible parameters a
, we need to use a void*
for the type_var
union member.
Writing the marshalling functions for MyType
entirely manually is very painful and error prone. There are a lot of details about the exact size, alignment, and layout of C structures that you'd need to get right. Instead, we'll use the c2hs
helper package. We'll start with a little preamble at the top of a new MyLib2.chs
:
-- file: MyLib2.chs
module MyLib2 where
import Foreign
import Foreign.C
#include "mylib.h"
The c2hs
package is great for working with enums. For example, creating marshalling infrastructure for the enum mytype_cons_e
tag with this package looks like this:
-- file: MyLib2.chs
{#enum mytype_cons_e as MyTypeCons {}#}
Note that this automatically retrieves the definition from the C header mylib.h
, creates a Haskell definition equivalent to:
-- data MyTypeCons = TYPEDOUBLE | TYPEINT | etc.
and defines the needed Enum
instance to map the Haskell constructors to and from the integer values on the C side. It'll be useful to have our generalized toEnum
and fromEnum
helpers here, too:
genToEnum :: (Integral a, Enum b) => a -> b
genToEnum = toEnum . fromIntegral
genFromEnum :: (Integral a, Enum b) => b -> a
genFromEnum = fromIntegral . fromEnum
Now, let's look at marshalling your data type:
data MyType a =
TypeDouble Double
| TypeInt Int32
| TypeVar a
| TypeAdd (MyType a) (MyType a)
to and from a struct mytype_s
. One warning: these implementations assume that the recursive constructor TypeAdd
and its C analogue type_add
are never used to create "cycles" on the C or Haskell side. Handling data structures that are recursive in the sense that let x = 0:x
is recursive would require a different approach.
Because struct mytype_s
is a fixed-length structure, you might think it would be a good candidate for a Storable
instance, but that turns out not to be the case. Because of the embedded pointer in the type_var
union member and the recursive pointers in the type_add
member, it's not possible to write a reasonable Storable
instance for MyType
. We could write one for:
data C_MyType a =
C_TypeDouble Double
| C_TypeInt Int32
| C_TypeVar (Ptr a)
| C_TypeAdd (Ptr (MyType a)) (Ptr (MyType a))
where the pointers have been made explicit. When we marshall this, we'll assume we've already marshalled the "child" nodes and have pointers to them that we can marshall out to the structure. For the C_TypeAdd
constructor, I could have written this instead:
-- C_TypeAdd (Ptr (C_MyType a)) (Ptr (C_MyType a))
It doesn't really matter, since we'll be freely casting Ptr
s back and forth between MyType
s and C_MyType
s . I decided to use my definition because it got rid of two castPtr
calls.
The Storable
instance for C_MyType
looks like this. Note how c2hs
allows us to look up sizes, alignments, and offsets automatically. We'd have to calculate these all manually otherwise.
instance Storable (C_MyType a) where
sizeOf _ = {#sizeof mytype_s#}
alignment _ = {#alignof mytype_s#}
peek p = do
typ <- genToEnum <$> {#get struct mytype_s->mytype_cons#} p
case typ of
TYPEDOUBLE ->
C_TypeDouble . (\(CDouble x) -> x)
<$> {#get struct mytype_s->mytype_value.type_double#} p
TYPEINT ->
C_TypeInt . (\(CInt x) -> x)
<$> {#get struct mytype_s->mytype_value.type_int #} p
TYPEVAR ->
C_TypeVar . castPtr <$> {#get struct mytype_s->mytype_value.type_var#} p
TYPEADD -> do
q1 <- {#get struct mytype_s->mytype_value.type_add.left#} p
q2 <- {#get struct mytype_s->mytype_value.type_add.right#} p
return $ C_TypeAdd (castPtr q1) (castPtr q2)
poke p t = case t of
C_TypeDouble x -> do
tag TYPEDOUBLE
{#set struct mytype_s->mytype_value.type_double#} p (CDouble x)
C_TypeInt x -> do
tag TYPEINT
{#set struct mytype_s->mytype_value.type_int #} p (CInt x)
C_TypeVar q -> do
tag TYPEVAR
{#set struct mytype_s->mytype_value.type_var #} p (castPtr q)
C_TypeAdd q1 q2 -> do
tag TYPEADD
{#set struct mytype_s->mytype_value.type_add.left #} p (castPtr q1)
{#set struct mytype_s->mytype_value.type_add.right#} p (castPtr q2)
where
tag = {#set struct mytype_s->mytype_cons#} p . genFromEnum
With the Storable
instance for C_MyType
out of the way, the marshalling functions for a real MyType
look pretty clean:
peekMyType :: (Ptr a -> IO a) -> Ptr (MyType a) -> IO (MyType a)
peekMyType peekA p = do
ct <- peek (castPtr p)
case ct of
C_TypeDouble x -> return $ TypeDouble x
C_TypeInt x -> return $ TypeInt x
C_TypeVar q -> TypeVar <$> peekA q
C_TypeAdd q1 q2 -> do
t1 <- peekMyType peekA q1
t2 <- peekMyType peekA q2
return $ TypeAdd t1 t2
newMyType :: (a -> IO (Ptr a)) -> MyType a -> IO (Ptr (MyType a))
newMyType newA t = do
p <- malloc
case t of
TypeDouble x -> poke p (C_TypeDouble x)
TypeInt x -> poke p (C_TypeInt x)
TypeVar v -> poke p . C_TypeVar =<< newA v
TypeAdd t1 t2 -> do
q1 <- newMyType newA t1
q2 <- newMyType newA t2
poke p (C_TypeAdd q1 q2)
return (castPtr p) -- case from Ptr C_MyType to Ptr MyType
freeMyType :: (Ptr a -> IO ()) -> Ptr (MyType a) -> IO ()
freeMyType freeA p = do
ct <- peek (castPtr p)
case ct of
C_TypeVar q -> freeA q
C_TypeAdd q1 q2 -> do
freeMyType freeA q1
freeMyType freeA q2
_ -> return () -- no children to free
free p
Note how we need to use helpers for the a
type. Whenever we want to make a newMyType
for a MyType a
, we'll need to provide a tailored newA
for the a
type. It would be possible to make this into a typeclass and even create an instance for all Storable a
, but I haven't done that here.
Now, suppose we have a Haskell function that uses all these data types that we'd like to export to C:
replaceSymbols :: OrdSymb -> Symbol -> Symbol -> MyType Symbol -> MyType Symbol
replaceSymbols ord sym1 sym2 = go
where
go (TypeVar s) | checkSymb s ord sym1 = TypeVar sym2
go (TypeAdd t1 t2) = TypeAdd (go t1) (go t2)
go rest = rest
with the helper functions previously defined:
compareSymb :: Symbol -> Symbol -> OrdSymb
compareSymb (Symbol x) (Symbol y) =
case compare x y of { EQ -> SEQ; LT -> SLT; GT -> SGT }
checkSymb :: Symbol -> OrdSymb -> Symbol -> Bool
checkSymb x ord y = compareSymb x y == ord
We'll need a few other things in MyLib2.chs
. First, we'll use c2hs
to define the OrdSymb
type (again, this automatically generates the associated data OrdSymb
):
{#enum ord_symb as OrdSymb {} deriving (Show, Eq)#}
type C_OrdSymb = CInt
and the symbol-marshalling code copied from MyLib1.hs
:
newtype Symbol = Symbol String
peekSymbol :: Ptr Symbol -> IO Symbol
peekSymbol ptr = Symbol <$> peekCString (castPtr ptr)
newSymbol :: Symbol -> IO (Ptr Symbol)
newSymbol (Symbol str) = castPtr <$> newCString str
freeSymbol :: Ptr Symbol -> IO ()
freeSymbol = free
Then, we can write the following C wrapper:
mylib_replace_symbols :: C_OrdSymb -> Ptr Symbol -> Ptr Symbol
-> Ptr (MyType Symbol) -> IO (Ptr (MyType Symbol))
mylib_replace_symbols ord psym1 psym2 pt = do
sym1 <- peekSymbol psym1
sym2 <- peekSymbol psym2
t <- peekMyType peekSymbol pt
let t' = replaceSymbols (genToEnum ord) sym1 sym2 t
newMyType newSymbol t'
Given that this returns a malloced data structure, it's helpful to also provide an exported function to free it:
mylib_free_mytype_symbol :: Ptr (MyType Symbol) -> IO ()
mylib_free_mytype_symbol = freeMyType freeSymbol
And let's export them:
foreign export ccall mylib_replace_symbols
:: C_OrdSymb -> Ptr Symbol -> Ptr Symbol
-> Ptr (MyType Symbol) -> IO (Ptr (MyType Symbol))
foreign export ccall mylib_free_mytype_symbol
:: Ptr (MyType Symbol) -> IO ()
If you take all the Haskell code in this section, starting with the module MyLib2
line and put it in MyLib2.chs
, then create/modify the following files:
// file: mylib.h
#ifndef MYLIB_H
#define MYLIB_H
enum ord_symb {
SLT = -1,
SEQ = 0,
SGT = 1
};
typedef char* symbol; // NUL-terminated string
typedef struct mytype_s {
enum mytype_cons_e {
TYPEDOUBLE,
TYPEINT,
TYPEVAR,
TYPEADD
} mytype_cons;
union {
double type_double;
int type_int;
void* type_var;
struct {
struct mytype_s *left;
struct mytype_s *right;
} type_add;
} mytype_value;
} mytype;
#endif
and:
// file: example2.c
#include <HsFFI.h>
#include "MyLib2_stub.h"
#include <stdio.h>
#include "mylib.h"
// AST for: 1.0 + foo
mytype node1 = { TYPEDOUBLE, {type_double: 1.0} };
mytype node2 = { TYPEVAR, {type_var: "foo"} };
mytype root = { TYPEADD, {type_add: {&node1, &node2} } };
int main(int argc, char *argv[])
{
hs_init(&argc, &argv);
mytype *p1 = mylib_replace_symbols(SEQ, "foo", "bar", &root);
printf("%s\n", // should print "bar"
(char*) p1->mytype_value.type_add.right->mytype_value.type_var);
mytype *p2 = mylib_replace_symbols(SEQ, "quux", "bar", &root);
printf("%s\n", // unchanged -- should still be "foo"
(char*) p2->mytype_value.type_add.right->mytype_value.type_var);
mylib_free_mytype_symbol(p1);
mylib_free_mytype_symbol(p2);
hs_exit();
return 0;
}
and add the executable example2
clause to your Cabal file:
-- file: ffitypes.cabal
name: ffitypes
version: 0.1.0.0
cabal-version: >= 1.22
build-type: Simple
executable example1
main-is: example1.c
other-modules: MyLib1
include-dirs: .
includes: mylib.h
build-depends: base
default-language: Haskell2010
cc-options: -Wall -O
ghc-options: -Wall -Wno-incomplete-patterns -O
executable example2
main-is: example2.c
other-modules: MyLib2
include-dirs: .
includes: mylib.h
build-depends: base
build-tools: c2hs
default-language: Haskell2010
cc-options: -Wall -O
ghc-options: -Wall -Wno-incomplete-patterns -O
and stick them all in the ffitypes
directory, then you should be able to stack build
and stack exec example2
.
As you can perhaps tell from the code above, it takes a lot of work to create even simple C bindings for Haskell libraries. If it's any consolation, creating Haskell bindings for C libraries is only a little bit easier. Best of luck!