Search code examples
rif-statementsampledplyr

How to replace specific values in a dataset with randomized numbers?


I have a data column that contains a bunch of ranges as strings (e.g. "2 to 4", "5 to 6", "7 to 8" etc.). I'm trying to create a new column that converts each of these values to a random number within the given range. How can I leverage conditional logic within my function to solve this problem?

I think the function should be something along the lines of:

df<-mutate(df, c2=ifelse(df$c=="2 to 4", sample(2:4, 1, replace=TRUE), "NA"))

Which should produce a new column in my dataset that replaces all the values of "2 to 4" with a random integer between 2 and 4, however, this is not working and replacing every value with "NA".

Ideally, I am trying to do something where the dataset:

df<-c("2 to 4","2 to 4","5 to 6")

Would add a new column:

df<-c2("3","2","5")

Does anyone have any idea how to do this?


Solution

  • We can split the string on "to" and create a range between the two numbers after converting them to numeric and then use sample to select any one of the number in range.

    df$c2 <- sapply(strsplit(df$c1, "\\s+to\\s+"), function(x) {
             vals <- as.integer(x)
             sample(vals[1]:vals[2], 1)
    })
    
    df
    #      c1 c2
    #1 2 to 4  2
    #2 2 to 4  3
    #3 5 to 6  5
    

    data

    df<- data.frame(c1 = c("2 to 4","2 to 4","5 to 6"), stringsAsFactors = FALSE)