Search code examples
rif-statementsample

Sample function repeats same value within ifelse


I have the following dataframe:

structure(list(Store = c("vpm", "vpm", 
"vpm"), Date = structure(c(18042, 18042, 18042), class = "Date"), 
    UniqueImageId = c("vp3_523", "vp3_668", "vp3_523"), EntryTime = structure(c(1558835514, 
    1558834942, 1558835523), class = c("POSIXct", "POSIXt")), 
    ExitTime = structure(c(1558838793, 1558838793, 1558839824
    ), class = c("POSIXct", "POSIXt")), Duration = c(3279, 3851, 
    4301), Age = c(35L, 35L, 35L), EntryPoint = c("Entry2Side", 
    "Entry2Side", "Entry2Side"), ExitPoint = c("Exit2Side", "Exit2Side", 
    "Exit2Side"), AgeNew = c("15_20", "25_32", "15_20"), GenderNew = c("Female", 
    "Male", "Female")), row.names = 4:6, class = c("data.table", 
"data.frame"))

I am trying to populate a random number for the column AgeNew and I am using sample function with ifelse condition.

I tried the following

d$AgeNew <- ifelse(d$AgeNew == "0_2",   sample(0:2,  1,replace = TRUE), 
            ifelse(d$AgeNew == "15_20", sample(15:20,1,replace = TRUE), 
            ifelse(d$AgeNew == "25_32", sample(25:36,1,replace = TRUE), 
            ifelse(d$AgeNew == "38_43", sample(36:43,1,replace = TRUE), 
            ifelse(d$AgeNew == "4_6",   sample(4:6,  1,replace = TRUE), 
            ifelse(d$AgeNew == "48_53", sample(48:53,1,replace = TRUE), 
            ifelse(d$AgeNew == "60_Inf",sample(60:65,1,replace = TRUE), 
                                        sample(8:13, 1,replace = TRUE))))))))

But I am getting the same value getting repeated. For example, for the age group 0_2 I have only 2 populated. I tried using set.seed

set.seed(123)

and then running the ifelse still it repeats the same value.


Solution

  • This has been discussed somewhere (cannot find the source at the moment). The reason it behaves like this is because ifelse runs only once for one condition hence, the value is recycled. Consider this example,

    x <- c(1, 2, 1, 2, 1, 2)
    
    ifelse(x == 1, sample(1:10, 1), sample(20:30, 1))
    #[1]  1 26  1 26  1 26
    ifelse(x == 1, sample(1:10, 1), sample(20:30, 1))
    #[1] 10 28 10 28 10 28
    ifelse(x == 1, sample(1:10, 1), sample(20:30, 1))
    #[1]  9 24  9 24  9 24
    

    As we can see it gives the same number which is recycled for both the scenarios. To avoid that we need to specify size of sample as length of the test condition in ifelse

    ifelse(x == 1, sample(1:10, length(x)), sample(20:30, length(x)))
    #[1]  7 23  1 26 10 24
    ifelse(x == 1, sample(1:10, length(x)), sample(20:30, length(x)))
    #[1]  3 23  5 26  6 22 
    ifelse(x == 1, sample(1:10, length(x)), sample(20:30, length(x)))
    #[1]  2 30  9 27  1 29