I have a data column that contains a bunch of ranges as strings (e.g. "2 to 4", "5 to 6", "7 to 8" etc.). I'm trying to create a new column that converts each of these values to a random number within the given range. How can I leverage conditional logic within my function to solve this problem?
I think the function should be something along the lines of:
df<-mutate(df, c2=ifelse(df$c=="2 to 4", sample(2:4, 1, replace=TRUE), "NA"))
Which should produce a new column in my dataset that replaces all the values of "2 to 4" with a random integer between 2 and 4, however, this is not working and replacing every value with "NA".
Ideally, I am trying to do something where the dataset:
df<-c("2 to 4","2 to 4","5 to 6")
Would add a new column:
df<-c2("3","2","5")
Does anyone have any idea how to do this?
We can split the string on "to"
and create a range between the two numbers after converting them to numeric and then use sample
to select any one of the number in range.
df$c2 <- sapply(strsplit(df$c1, "\\s+to\\s+"), function(x) {
vals <- as.integer(x)
sample(vals[1]:vals[2], 1)
})
df
# c1 c2
#1 2 to 4 2
#2 2 to 4 3
#3 5 to 6 5
data
df<- data.frame(c1 = c("2 to 4","2 to 4","5 to 6"), stringsAsFactors = FALSE)