Search code examples
scalagenerator

Is the map generator from the EPFL online course able to generate every possible map?


https://www.coursera.org/learn/progfun2 assignment for Week 1 shows, as an example, a generator for maps of type Map[Int, Int]:

lazy val genMap: Gen[Map[Int,Int]] = oneOf(
  const(Map.empty[Int,Int]),
  for {
    k <- arbitrary[Int]
    v <- arbitrary[Int]
    m <- oneOf(const(Map.empty[Int,Int]), genMap)
  } yield m.updated(k, v)
)

I'm new to Scala, but I'm familiar with generators in imperative programming languages. My understanding of the generator's execution flow is as follows:

  1. arbitrary[Int] is called, it returns a generator yielding an endless sequence of Ints, the first generated value is assigned to k
  2. arbitrary[Int] is called again, it returns a new generator, the first generated value is assigned to v
  3. A random map is created recursively, updated with k->v, and yielded to the consumer
  4. When the next value from the generator is requested, the execution resumes at m <- ... definition, proceeding with a new random m and the same k->v mapping
  5. Neither const nor the recursive genMap ever run out of values, meaning that the "loop" for m never terminates, so new values for v and k are never requested from the corresponding arbitrary generators.

My conclusion is that all generated maps would either be empty or include the k->v mapping generated in the first iteration of the outermost invocation, i.e. genMap can never generate a non-empty map without such a mapping.

Q1: are my analysis and my conclusion correct?

Q2: if they are, how can I implement a generator which, after generating a first map, would have non-zero chance of generating any possible map?

Q3: if I simplify the last definition in the for-expression to m <- genMap, does that change the generator's behaviour in any way?


Solution

  • In short, your analysis and conclusion aren't correct.

    I suspect the root of the misunderstanding is in interpreting for as a loop (it's not in general, and specifically not so in this context (when dealing with things that are more explicitly collections, for is close enough, I guess)).

    I'll explain from the top down.

    oneOf, given 1 or more generators will create a generator which, when asked to generate a value, will defer to one of the the given generators by random selection. So

    oneOf(
      const(Map.empty[Int, Int]),
      k: Gen[Map[Int, Int]]  // i.e. some generator for Map[Int, Int]
    )
    

    The output might be

    someMapFromK, Map.empty, someMapFromK, someMapFromK, Map.empty, Map.empty...
    

    In this case, our k is

    for {
      k <- arbitrary[Int]
      v <- arbitrary[Int]
      m <- oneOf(const(Map.empty[Int, Int]), genMap)  // genMap being the name the outermost generator will be bound to
    } yield m.updated(k)
    

    for is syntactic sugar for calls to flatMap and map:

    arbitrary[Int].flatMap { k =>
      arbitrary[Int].flatMap { v =>
        oneOf(const(Map.empty[Int, Int]), genMap).map { m =>
          m.updated(k, v)
        }
      }
    }
    

    For something like List, map and flatMap consume the entire collection. Gen is lazier:

    • flatMap basically means generate a value, and feed that value to a function that results in a Gen
    • map basically means generate a value, and transform it

    If we imagined a method on Gen named sample which gave us the "next" generated value (for this purpose, we'll say that for a Gen[T] it will result in T and never throw an exception, etc.) genMap is exactly analogous to:

    trait SimpleGen[T] { def sample: T }
    
    lazy val genMap: SimpleGen[Map[Int, Int]] = new SimpleGen[Map[Int, Int]] {
      def sample: Map[Int, Int] =
        if (scala.util.Random.nextBoolean) Map.empty
        else {
          val k = arbitrary[Int].sample
          val v = arbitrary[Int].sample
          val m =
            if (scala.util.Random.nextBoolean) Map.empty
            else genMap.sample // Since genMap is lazy, we can recurse
          m.updated(k, v)
        }
      }
    

    Regarding the third question, in the original definition, the extra oneOf serves to bound the recursion depth to prevent the stack from being blown. For that definition, there's a 1/4 chance of going recursive, while replacing the inner oneOf with genMap would have a 1/2 chance of going recursive. Thus (ignoring the chance of a collision in the ks), for the first:

    • 50% chance of empty (50% chance of 1+)
    • 37.5% chance of size 1 (12.5% chance of 2+)
    • 9.375% chance of size 2 (3.125% chance of 3+)
    • 2.34375 chance of size 3 (0.78125% chance of 4+)...

    While for the second:

    • 50% chance of empty
    • 25% chance of size 1
    • 12.5% chance of size 2
    • 6.25% chance of size 3...

    Technically the possibility of stack overflow implies that depending on how many recursions you can make there's a maximum number of k -> v pairs in the Map you can generate, so there are almost certainly Maps that could not be generated.