Search code examples
haskellrandomaesonio-monad

How to use randomness in Haskell to produce instances of a JSON "model"?


I have this work where I have to read a JSON from a file and generate instances of it based on its model. I'm using aeson to serialize the objects, but I'm having a huge problem dealing with randomness to produce new objects.

Produce a new JSON based on what I get from file is pretty straight forward:

{-# LANGUAGE OverloadedLists #-}
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy.Char8    as ByteString
import qualified Data.Aeson                    as Aeson
import qualified Data.Aeson.Types              as Types
import qualified Data.Text

read :: String -> IO ()
read filePath = do
    json <- readFile filePath
    let Just parsedJSON =
            Data.Aeson.decode $ ByteString.pack json :: Maybe Aeson.Object
    let newJSON = fmap valueMapper parsedJSON
    print $ Aeson.encode newJSON


valueMapper :: Types.Value -> Types.Value
valueMapper value =
    case value of
        Types.String _      -> Types.String "randomValue"
        Types.Number _      -> Types.Number 0
        Types.Object object -> Types.Object $ fmap valueMapper object
        Types.Array  array  -> Types.Array $ fmap valueMapper array

My first attempt was to produce random values outside the IO. I used this function:

randomStr :: String
randomStr = take 10 $ randomRs ('a','z') $ unsafePerformIO newStdGen

Putting it on valueMapper:

valueMapper :: Types.Value -> Types.Value
valueMapper value =
    case value of
        Types.String _      -> Types.String $ Data.Text.pack randomStr
        Types.Number _      -> Types.Number 0
        Types.Object object -> Types.Object $ fmap valueMapper object
        Types.Array  array  -> Types.Array $ fmap valueMapper array

This "works", but all generated strings are the same, for every String field.

After a little research, I found out that if I want to produce different values for each String occurrence, I have to use the IO:

randomStr :: IO String
randomStr = replicateM 10 (randomRIO ('a', 'z'))

Now, I know that I have different strings for each call of randomStr... But I also have a type mismatch. Aeson String constructor to Value takes a Data.Text, but what I have is an IO String. As far as I know, my strings can never come back from IO.

I don't know if there is a way (hope so) to use the latest randomStr to compose my new JSON object. I also don't know if my approach is a good one. I'm open to suggestions about how can I put this to work, in my or any other way (some tips on how to write better code would be awesome too).


Solution

  • It's relatively common when writing Haskell code to find yourself needing to convert a block of pure code into an IO action (or monadic code in some other action). It comes with practice (and, as per the comments, after reading a LOT of tutorials), but I can show you my thought process as I work through your code example.

    As you've discovered, trying to "hide" the IO using unsafePerformIO is a terrible idea. The correct alternative is to rewrite the whole thing to operate in the IO monad, even though -- as you've seen -- rewriting randomStr :: String as randomStr :: IO String starts a chain of type mismatch errors that need to be resolved all the way to the top.

    So, let's resolve them. If valueMapper is going to make use of randomStr :: IO String, it too will need to operate in the IO monad:

    valueMapper :: Types.Value -> IO Types.Value
    

    (Note: If you make this change while using a live type-checking IDE, you'll find that the invocation of valueMapper in read is now flagged as a type error, as are the four branches in the case statement.)

    Anyway, there's no problem with the outer structure of valueMapper, namely case-matching on the argument:

    valueMapper value =
        case value of
            Types.String _      -> ???
            Types.Number _      -> ???
            Types.Object object -> ???
            Types.Array  array  -> ???
    

    The difference is that now each of the ??? need to return an IO Types.Value instead of a Types.Value. Let's start with an easy one. Suppose we aren't going to generate random numbers yet, so we just want to convert the branch:

    Types.Number _ -> Types.Number 0   -- pure version
    

    to IO. Here, we have a pure value Types.Number 0 :: Types.Value, and we want a monadic version. That's what return is used for:

    Types.Number _ -> return (Types.Number 0)   -- IO version
    

    The next easiest is the string branch. Right now, it looks like:

    Types.String _ -> Types.String $ Data.Text.pack randomStr
    

    where randomStr is an IO String. It's still a mess of type errors, though. That's because randomStr is an IO String, and we want to convert it to an IO Types.Value, but the construct:

    Types.String $ Data.Text.pack _
    

    is trying to convert a String to a Types.Value directly. This is a common problem when working with monadic values. We have an IO a that we want to convert to an IO b, but all we have is a function (here, Types.String . Data.Text.pack) to perform the direct conversion a -> b. It would be helpful if we had some adapter function with signature:

    foo :: (a -> b) -> IO a -> IO b
    

    Fortunately, because IO, like all monads, is also a functor, we do have such an adapter:

    fmap :: (a -> b) -> IO a -> IO b
    

    so, we can write the branch as:

    Types.String _ -> fmap (Types.String . Data.Text.pack) randomStr
    

    The last two, for objects and arrays, are tougher, though the solution is the same for each. Taking the object branch, it's pure version looks like:

    Types.Object object -> Types.Object $ fmap valueMapper object
    

    In the pure version, fmap is used to apply valueMapper :: Types.Value -> Types.Value to each element of the list object :: [Types.Value] to get a new list of type [Types.Value] which is then converted to a Types.Value using the Types.Object constructor.

    Let's tackle the fmap first. We will still have object :: [Types.Value] from case matching on our pure argument value, so that hasn't changed. But we want to apply the function valueMapper :: Types.Value -> IO Types.Value to each element of object. The result will be a list of Types.Value, but in an IO monad context, so the full result type will be IO [Types.Value]. That is, we want an adapter function:

    bar :: (a -> IO a) -> [a] -> IO [a]
           ^^^^^^^^^^^    ^^^    ^^^^^^- output list in IO context
                     |      `- input list
                     `- element-by-element conversion
    

    Such a function already exists in more general form as traverse. The full signature is:

    traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
    

    but specialized to a list traversable and an IO applicative, it's:

    traverse :: (a -> IO b) -> [a] -> IO [b]
    

    Using it in our object branch, the result will look like:

    Types.Object object -> Types.Object $ traverse valueMapper object
    

    This still gives a type error because traverse valueMapper object returns an IO [Types.Value], and we are trying to use the direct conversion Types.Object :: [Types.Value] -> Types.Value where we actually need a conversion IO [Types.Value] -> IO Types.Value. This is the same problem we had with the string branch above, and the solution is to use fmap, so the following type checks:

    Types.Object object -> fmap Types.Object (traverse valueMapper object)
    

    You might want to take a moment here to perform the equivalent conversion on the array branch.

    With those changes, valueMapper will type check. Now the only issue is that it's usage in read won't type check. The problem is the line:

    let newJSON = fmap valueMapper parsedJSON
    

    Here parsedJSON is of type Aeson.Object AKA Types.Object which is actually an alias for HashMap Text Types.Value. The fmap here is used apply a pure valueMapper :: Types.Value -> Types.Value to each hashmap element in turn.

    Now, we want to apply valueMapper :: Types.Value -> IO Types.Value to each element in turn and get the whole result in an IO context as IO Aeson.Object. Fortunately, Aeson.Object AKA HashMap is Traversable, so the solution here is the same as for the object and array branches before -- replace the fmap with traverse:

    let newJSON = traverse valueMapper parsedJSON
    

    This still won't quite work, as the next line:

    print $ Aeson.encode newJSON
    

    expects newJSON to be a pure Aeson.Object, but the return value of the traverse call is in the IO context, so it's IO Aeson.Object. We could try to rewrite this print line to expect newJSON :: IO Aeson.Object. For example, the following would work:

    print =<< fmap Aeson.encode newJSON
    

    However, there's actually a much simpler way. In a do-block, the left arrow <- notation can be used for this purpose. Where:

    let newJSON = traverse valueMapper parsedJSON
    

    assigns newJSON an IO action of type IO Aeson.Object, the alternative:

    newJSON <- traverse valueMapper parsedJSON
    

    "unwraps" the IO action to assign newJSON the underlying Aeson.Object for use in subsequent statements. So:

    newJSON <- traverse valueMapper parsedJSON
    print $ Aeson.encode newJSON
    

    will type check.

    One more stylistic note. It's common to use the infix synonym <$> in place of fmap to apply pure functions to IO actions. So, the final program would look like this:

    import qualified Data.ByteString.Lazy.Char8    as ByteString
    import qualified Data.Aeson                    as Aeson
    import qualified Data.Aeson.Types              as Types
    import qualified Data.Text
    
    import Control.Monad
    import System.Random
    
    read :: String -> IO ()
    read filePath = do
        json <- readFile filePath
        let Just parsedJSON =
                Aeson.decode $ ByteString.pack json :: Maybe Aeson.Object
        newJSON <- traverse valueMapper parsedJSON
        print $ Aeson.encode newJSON
    
    valueMapper :: Types.Value -> IO Types.Value
    valueMapper value =
        case value of
            Types.String _      -> Types.String . Data.Text.pack <$> randomStr
            Types.Number _      -> return $ Types.Number 0
            Types.Object object -> Types.Object <$> traverse valueMapper object
            Types.Array  array  -> Types.Array <$> traverse valueMapper array
    
    randomStr :: IO String
    randomStr = replicateM 10 (randomRIO ('a', 'z'))