Search code examples
rsample

How to randomly select the content of some cells into a data frame?


It seems simple but I cannot find the solution: I want to randomly select some elements into a data frame imported from a .xlsx file.Is there a function such as sample_n to do this?

My problem lies in the fact that sample_n returns a number of whole rows and not single data. That is: I want a sample of elements into whatever (and possibly repeated) row.

Here is an example:

    MAS<-function(x,n){sample_n(x, n, na.rm=FALSE)}
df <- data.frame(
  "entero" = 1:4, 
  "factor" = c("a", "b", "c", "d"), 
  "numero" = c(NA, 3.4, NA, 5.6),
  "cadena" = as.character(c("a", "b", "c", "d"))
)
MAS(df,2)

which returns, for example:

      entero factor numero cadena
1      3      c     NA      c
2      4      d    5.6      d

That is, whole rows instead of single elements. I would also like to avoid the 'NA' values, by the way.

Thank you.


Solution

  • If you don't mind getting a character vector, it is as simple as coercing the data.frame into a (character) matrix and then sampling from it:

    ddff <- as.matrix(df)
    sample(ddff, 2)
    

    To avoid sampling NA values, just restrict the sampling to the cells which aren't missing:

    sample(ddff[!is.na(ddff)], 2)
    

    If you want to keep the class, you'll need to get a list in return, and the sampling gets a bit trickier.