Search code examples
stringhaskellffipurely-functional

How to write a pure String to String function in Haskell FFI to C++


I want to implement a function in C++ via Haskell FFI, which should have the (final) type of String -> String. Say, is it possible to re-implement the following function in C++ with the exact same signature?

import Data.Char
toUppers:: String -> String
toUppers s = map toUpper s

In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.

The reason I want to do this is that I have the impression that marshaling is messy with FFI. Maybe if I can fix the simplest case above (other than primitive types such as int), then I can do whatever data parsing I want on the C++ side, which should be easy.

The cost of parsing is negligible compared to the computation that I want to do between the marshalling to/from strings.

Thanks in advance.


Solution

  • You need to involve IO at least at some point, to allocate buffers for the C-strings. The straightforward solution here would probably be:

    import Foreign
    import Foreign.C
    import System.IO.Unsafe as Unsafe
    
    foreign import ccall "touppers" c_touppers :: CString -> IO ()
    toUppers :: String -> String
    toUppers s =
      Unsafe.unsafePerformIO $
        withCString s $ \cs ->
          c_touppers cs >> peekCString cs
    

    Where we use withCString to marshall the Haskell string into a buffer, change it to upper-case and finally un-marshall the (changed!) buffer contents into the new Haskell string.

    Another solution could be to delegate messing with IO to the bytestring library. That could be a good idea anyways if you are interested in performance. The solution would look roughly like follows:

    import Data.ByteString.Internal
    
    foreign import ccall "touppers2" 
      c_touppers2 :: Int -> Ptr Word8 -> Ptr Word8 -> IO ()
    toUppers2 :: ByteString -> ByteString
    toUppers2 s =
      unsafeCreate l $ \p2 -> 
        withForeignPtr fp $ \p1 ->
          c_touppers2 l (p1 `plusPtr` o) p2
     where (fp, o, l) = toForeignPtr s
    

    This is a bit more elegant, as we now don't actually have to do any marshalling, just convert pointers. On the other hand, the C++ side changes in two respects - we have to handle possibly non-null-terminated strings (need to pass the length) and now have to write to a different buffer, as the input is not a copy anymore.


    For reference, here are two quick-and-dirty C++ functions that fit the above imports:

    #include <ctype.h>
    extern "C" void touppers(char *s) {
        for (; *s; s++) *s = toupper(*s);
    }
    extern "C" void touppers2(int l, char *s, char *t) {
        for (int i = 0; i < l; i++) t[i] = toupper(s[i]);
    }