https://www.coursera.org/learn/progfun2 assignment for Week 1 shows, as an example, a generator for maps of type Map[Int, Int]
:
lazy val genMap: Gen[Map[Int,Int]] = oneOf(
const(Map.empty[Int,Int]),
for {
k <- arbitrary[Int]
v <- arbitrary[Int]
m <- oneOf(const(Map.empty[Int,Int]), genMap)
} yield m.updated(k, v)
)
I'm new to Scala, but I'm familiar with generators in imperative programming languages. My understanding of the generator's execution flow is as follows:
arbitrary[Int]
is called, it returns a generator yielding an endless sequence of Int
s, the first generated value is assigned to k
arbitrary[Int]
is called again, it returns a new generator, the first generated value is assigned to v
k->v
, and yielded to the consumerm <- ...
definition, proceeding with a new random m
and the same k->v
mappingconst
nor the recursive genMap
ever run out of values, meaning that the "loop" for m
never terminates, so new values for v
and k
are never requested from the corresponding arbitrary
generators.My conclusion is that all generated maps would either be empty or include the k->v
mapping generated in the first iteration of the outermost invocation, i.e. genMap
can never generate a non-empty map without such a mapping.
Q1: are my analysis and my conclusion correct?
Q2: if they are, how can I implement a generator which, after generating a first map, would have non-zero chance of generating any possible map?
Q3: if I simplify the last definition in the for
-expression to m <- genMap
, does that change the generator's behaviour in any way?
In short, your analysis and conclusion aren't correct.
I suspect the root of the misunderstanding is in interpreting for
as a loop (it's not in general, and specifically not so in this context (when dealing with things that are more explicitly collections, for
is close enough, I guess)).
I'll explain from the top down.
oneOf
, given 1 or more generators will create a generator which, when asked to generate a value, will defer to one of the the given generators by random selection. So
oneOf(
const(Map.empty[Int, Int]),
k: Gen[Map[Int, Int]] // i.e. some generator for Map[Int, Int]
)
The output might be
someMapFromK, Map.empty, someMapFromK, someMapFromK, Map.empty, Map.empty...
In this case, our k
is
for {
k <- arbitrary[Int]
v <- arbitrary[Int]
m <- oneOf(const(Map.empty[Int, Int]), genMap) // genMap being the name the outermost generator will be bound to
} yield m.updated(k)
for
is syntactic sugar for calls to flatMap
and map
:
arbitrary[Int].flatMap { k =>
arbitrary[Int].flatMap { v =>
oneOf(const(Map.empty[Int, Int]), genMap).map { m =>
m.updated(k, v)
}
}
}
For something like List
, map
and flatMap
consume the entire collection. Gen
is lazier:
flatMap
basically means generate a value, and feed that value to a function that results in a Gen
map
basically means generate a value, and transform itIf we imagined a method on Gen
named sample
which gave us the "next" generated value (for this purpose, we'll say that for a Gen[T]
it will result in T
and never throw an exception, etc.) genMap
is exactly analogous to:
trait SimpleGen[T] { def sample: T }
lazy val genMap: SimpleGen[Map[Int, Int]] = new SimpleGen[Map[Int, Int]] {
def sample: Map[Int, Int] =
if (scala.util.Random.nextBoolean) Map.empty
else {
val k = arbitrary[Int].sample
val v = arbitrary[Int].sample
val m =
if (scala.util.Random.nextBoolean) Map.empty
else genMap.sample // Since genMap is lazy, we can recurse
m.updated(k, v)
}
}
Regarding the third question, in the original definition, the extra oneOf
serves to bound the recursion depth to prevent the stack from being blown. For that definition, there's a 1/4 chance of going recursive, while replacing the inner oneOf
with genMap
would have a 1/2 chance of going recursive. Thus (ignoring the chance of a collision in the k
s), for the first:
While for the second:
Technically the possibility of stack overflow implies that depending on how many recursions you can make there's a maximum number of k -> v
pairs in the Map
you can generate, so there are almost certainly Map
s that could not be generated.