Search code examples
haskellnetwork-programmingtcp

How can we serialize & deserialize a Haskell Data Struct to Bytestring?


I have a TCP Server written in C++, which expects a request in following format:

struct Header {
   int headerField1;
   int headerField2;
}

struct Request {
   Header header;
   char[14] uniqueID;
   char[12] password;
}

I want to implement a client to send this request to my server in Haskell

I have tried Data.Binary.encode which doesn't do the trick. I am more confused how can I use a arbitrary sized type in haskell. i.e. char[12];

Haskell Code:

data Header = Header 
  {
     headerField1   :: Word32
  ,  headerField2   :: Word32
  } deriving (Generic)
instance Binary Header

data Request = Request 
  {
     header         :: Header
  ,  uniqueID       :: ByteString -- I am not sure which data type to use here.
  ,  password       :: ByteString -- Same as above, as length is defined 12 bytes which is arbitrary.
  } deriving (Generic)
instance Binary Request

I have written a custom bytestring to data parser which works great for header as there are no arbitrary sized type

parseHeader = do
    Header <$> 
        getWord32le <*>
        getWord32le

I am looking for a Haskell way to do the serialization & deserialization of the packet structure defined to ByteString (and vice versa) along with a way to create arbitrary sized data type -- char[12]


Solution

  • To address the main question first, you can parse bytestrings of a known length with getByteString (or getLazyByteString). So a binary parser for Request could be:

    parseRequest :: Get Request
    parseRequest =
      Request
        <$> parseHeader
        <*> getByteString 14
        <*> getByteString 12
    

    If you also have a serializers, say putRequest, you can put it in a Binary instance with the parser, allowing you to use some more functions of the library for convenience (but you don't have to).

    instance Binary Request where
      get = parseRequest
      put = putRequest
    

    To avoid mixing up password and id, it seems a good idea to wrap them in newtypes:

    newtype UniqueID = MkUniqueID ByteString  -- length 14
    newtype Password = MkPassword ByteString  -- length 12
    

    When implementing operations on those, make sure that they don't construct values of the wrong length. Then you can hide the constructors when exporting the types, so that users cannot break those invariants.

    The parsers for those types are where you specify the lengths you want:

    parseUniqueID :: Get UniqueID
    parseUniqueID = MkUniqueID <$> getByteString 14
    
    parsePassword :: Get Password
    parsePassword = MkPassword <$> getByteString 12
    

    Now this makes the definition of Request more descriptive, the only way to mix a password and an ID in Haskell code is to get the order wrong in serialization/deserialization, so this reduces the potential for mistakes elsewhere.

    data Request = Request
      { header   :: Header
      , uniqueID :: UniqueID
      , password :: Password
      }
    
    parseRequest :: Get Request
    parseRequest =
      Request
        <$> parseHeader
        <*> parseUniqueID
        <*> parsePassword