Search code examples
rdataframequickcheck

How do you generate a data frame with certain properties in QuickCheck


I'd like to generate a data.frame using the QuickCheck R library. The data.frame must have some non-random named columns that must have a certain type. When you run rdata.frame you get a completely random data.frame, with column names like col.1, col.2, ... which is not the goal.

For example the data frame below has two columns (x and y) with the types integer and factor.

> data.frame(x=1:10, y=rep(F, 10))
    x     y
1   1 FALSE
2   2 FALSE
3   3 FALSE
4   4 FALSE
5   5 FALSE
6   6 FALSE
7   7 FALSE
8   8 FALSE
9   9 FALSE
10 10 FALSE

I could do something like

> data.frame(x=rinteger(size=~10), y=rlogical(size=~10), z=rdouble(size=~10))
     x     y          z
1  -94 FALSE   7.124120
2  -64 FALSE -47.855625
3  -87 FALSE  -9.622184
4   -9 FALSE -28.678583
5  -78  TRUE  35.932244
6  -96  TRUE 116.449312
7  -63  TRUE  51.389978
8   65  TRUE -65.566058
9   71 FALSE 248.323594
10 -76  TRUE 138.238654

Which generates the expected format (a data.frame with the correct column names with random data of a specific type). But it seems to me that there must be a better way, since the number of rows is unimportant here.

It's fairly common to have a data.frame that adheres to certain properties as the input to functions, unfortunately the documentation is really cryptic on this part.

Bonus: how do you merge in certain constant values with this data.frame? (e.g. have a column u with values all 0, in addition to the randomly generated data).


Solution

  • library(quickcheck)
    library(functional)
    nr = rsize() # random number of rows
    generators.nr = 
      lapply(
        list(ri = rinteger, rd = rdouble, rl = rlogical), #all the ones you need
        Curry, 
        size = ~nr) 
    with(
      generators.nr, 
      data.frame(x = ri(), y = rd(), z = rl(), w = 1))