Search code examples
haskellencryptionbase64lazy-evaluationbytestring

Using base64-bytestring with lazy ByteStrings


Here's what I'm trying to do in Haskell:

  • take a message in ByteString format (doesn't really matter if lazy or strict)
  • encrypt the message with an RSA public key
  • base64 encode the encrypted message

The RSA library that I'm using handles lazy ByteStrings internally. The Base64 library, however, uses strict ByteStrings only. My application uses lazy ByteStrings to send the messages to network sockets.

So, it looks like I have to convert between lazy and strict ByteStrings. Here's what I do:

encrypt :: CryptoRandomGen t => t -> RSA.PublicKey -> L.ByteString -> L.ByteString
encrypt gen pubkey msg = do
  let (ciphertext,_) = RSA.encrypt gen pubkey msg
  (L.fromChunks . map encode . L.toChunks) $ ciphertext

decrypt :: RSA.PrivateKey -> L.ByteString -> Either String L.ByteString
decrypt privkey ciphertext = do
  dec <- decode $ S.concat $ L.toChunks ciphertext
  return $ RSA.decrypt privkey $ L.fromChunks [dec]

Unfortunately, sometimes this fails. When I decrypt a message encrypted in this way it sometimes results in garbage followed by the actual message. I'm not sure exactly where the problem is: is it the conversion from lazy to strict ByteStrings or is it the base64 encoding step? Or is it both?

Lazy ByteStrings are just lists of strict ByteString chunks. Do I implicitly modify the length of the message by converting it?

Please enlighten me.


Solution

  • The problem is that base64 encoding maps every three bytes (3 × 8 bits) of input to four bytes (4 × 6 bits) of output, so when the size of the input is not a multiple of three, it has to add padding. This means that concatenating the result of encoding each chunk separately may not give the same result as encoding the entire thing.

    > encode "Haskell"
    "SGFza2VsbA=="
    > encode "Hask" `append` encode "ell"
    "SGFzaw==ZWxs"
    

    Note that these are different even if you remove the = characters used to pad the output. The padding of the input will still cause problems.

    Your best bet is probably to find a library that supports lazy bytestrings, but ensuring that the sizes of all chunks (except the last) are multiples of three can work as a workaround.

    Alternatively, if you don't mind keeping the whole thing in memory, convert the lazy bytestring to a strict one, encode the whole thing in one step, and convert back (if necessary).