Search code examples
rrandomdataframeseq

How to generate random numbers in a data.frame with range


I have a data.frame which I want to generate random numbers each list by a sequence.

I used sample function to create random numbers but even I created random numbers for list [[1]], for set [[2]] same numbers produced again. So, here how can I create different random numbers for the set [[2]].

here is the simple code;

data.list <- lapply(1:2, function(x) {
nrep <- 1
time <- rep(seq(90,54000,by=90),times=nrep) 
Mx <- rep(sort(sample(seq(0.012,-0.014,length.out = 600),replace=TRUE)), times=nrep)
My <- rep(sort(sample(seq(0.02,-0.02,length.out = 600),replace=TRUE)), times=nrep)
Mz <- rep(sort(sample(seq(-1,1,length.out=600),replace=TRUE)), times=nrep)
data.frame(time,Mx,My,Mz,set_nbr=x)
})

this is provide the 5 first lines of each of datasets

[[1]]
      time       Mx            My           Mz       set_nbr
1      90 -1.391319e-02 -2.000000e-02 -1.000000000       1
2     180 -1.386978e-02 -1.986644e-02 -1.000000000       1
3     270 -1.386978e-02 -1.973289e-02 -0.996661102       1
4     360 -1.382638e-02 -1.973289e-02 -0.993322204       1
5     450 -1.382638e-02 -1.973289e-02 -0.979966611       1  
..     ..  ....            ....         ....           ...

[[2]]

      time       Mx            My           Mz       set_nbr
1      90 -1.395659e-02 -0.0200000000 -1.000000000       2
2     180 -1.391319e-02 -0.0199332220 -0.993322204       2
3     270 -1.386978e-02 -0.0199332220 -0.993322204       2
4     360 -1.386978e-02 -0.0199332220 -0.993322204       2
5     450 -1.382638e-02 -0.0199332220 -0.986644407       2
..     ..  ....            ....         ....           ...

EDIT 1:

regarding to @bgoldst answer now I can produce different numbers

set.seed(1);
data.list <- lapply(1:2, function(x) {
nrep <- 1;
time <- rep(seq(90,54000,by=90),times=nrep);
Mx <- rep(sort(runif(600,-0.014,0.012)),times=nrep);
My <- rep(sort(runif(600,-0.02,0.02)),times=nrep);
Mz <- rep(sort(runif(600,-1,1)),times=nrep);
data.frame(time,Mx,My,Mz,set_nbr=x);
});

On the other hand when I change nrep <- 3; same numbers are created for each nrep. This is the thing I want to avoid from the beginning.

EDIT 2:

@bgoldst showed that replicate does the job!


Solution

  • I think you may have some confusion about how sample() works.

    First, let's examine sample()'s behavior with respect to this simple vector:

    1:5;
    ## [1] 1 2 3 4 5
    

    When you pass a multi-element vector to sample() it basically just randomizes the order. This means you'll get a different result every time, or rather, to state it more precisely, the longer the vector is, the less likely you are to get the same result twice:

    set.seed(1); sample(1:5); sample(1:5); sample(1:5);
    ## [1] 2 5 4 3 1
    ## [1] 5 4 2 3 1
    ## [1] 2 1 3 4 5
    

    This means if you sort it immediately after sampling, then you'll get the same result every time. And if the original vector was itself sorted, then the result will also be equal to that original vector. This will be true regardless how sample() randomized the order, because the order is always restored by sort():

    set.seed(1); sort(sample(1:5)); sort(sample(1:5)); sort(sample(1:5));
    ## [1] 1 2 3 4 5
    ## [1] 1 2 3 4 5
    ## [1] 1 2 3 4 5
    

    Now if you add replace=T (or just rep=T if you like to take advantage of partial matching for concision, which I do), then you're not just randomizing the order, you're selecting size elements with replacement, where size is the vector length if you didn't provide size explicitly. This means you can get repeated elements in the result:

    set.seed(1); sample(1:5,rep=T); sample(1:5,rep=T); sample(1:5,rep=T);
    ## [1] 2 2 3 5 2
    ## [1] 5 5 4 4 1
    ## [1] 2 1 4 2 4
    

    And so, if you sort the result, you (likely) won't get back the original vector, because some elements will have been repeated, and some elements will have been omitted:

    set.seed(1); sort(sample(1:5,rep=T)); sort(sample(1:5,rep=T)); sort(sample(1:5,rep=T));
    ## [1] 2 2 2 3 5
    ## [1] 1 4 4 5 5
    ## [1] 1 2 2 4 4
    

    That's exactly what is happening with your code. Your output vectors are different between the two list components, because you're sampling with replacement before sorting, which means different repetitions and omissions of the elements will occur for each list component. But since you're sampling from the same sequence and you're sorting the result, you're bound to get similar-looking results for each list component, even though they're not identical.

    I think what you might be looking for is random deviates from a uniform distribution. You can get these from runif():

    set.seed(1); runif(5,-0.014,0.012);
    ## [1] -0.0070967748 -0.0043247786  0.0008941874  0.0096134025 -0.0087562698
    set.seed(1); runif(5,-0.02,0.02);
    ## [1] -0.009379653 -0.005115044  0.002914135  0.016328312 -0.011932723
    set.seed(1); runif(5,-1,1);
    ## [1] -0.4689827 -0.2557522  0.1457067  0.8164156 -0.5966361
    

    Thus, your code would become:

    set.seed(1);
    data.list <- lapply(1:2, function(x) {
        nrep <- 1;
        time <- rep(seq(90,54000,by=90),times=nrep);
        Mx <- rep(sort(runif(600,-0.014,0.012)),times=nrep);
        My <- rep(sort(runif(600,-0.02,0.02)),times=nrep);
        Mz <- rep(sort(runif(600,-1,1)),times=nrep);
        data.frame(time,Mx,My,Mz,set_nbr=x);
    });
    

    Which gives:

    lapply(data.list,head);
    ## [[1]]
    ##   time          Mx          My         Mz set_nbr
    ## 1   90 -0.01395224 -0.01994741 -0.9967155       1
    ## 2  180 -0.01394975 -0.01991923 -0.9933909       1
    ## 3  270 -0.01378866 -0.01980934 -0.9905714       1
    ## 4  360 -0.01371306 -0.01977090 -0.9854065       1
    ## 5  450 -0.01371011 -0.01961713 -0.9850108       1
    ## 6  540 -0.01365998 -0.01960718 -0.9846628       1
    ##
    ## [[2]]
    ##   time          Mx          My         Mz set_nbr
    ## 1   90 -0.01398426 -0.01997718 -0.9970438       2
    ## 2  180 -0.01398293 -0.01989651 -0.9931286       2
    ## 3  270 -0.01397330 -0.01988715 -0.9923425       2
    ## 4  360 -0.01396455 -0.01957807 -0.9913645       2
    ## 5  450 -0.01384501 -0.01939597 -0.9892001       2
    ## 6  540 -0.01382531 -0.01931913 -0.9889356       2
    

    Edit: It looked from your question like you wanted the random numbers to be different between list components, that is to say, between the components generated from the 1:2 passed as the first argument to lapply(). The repetition of each random vector nrep times within each list component didn't appear to be relevant, partly because you set nrep to 1, so there wasn't any actual repetition.

    But that's ok, we can achieve this requirement by using replicate() instead of rep(), because replicate() actual runs its expression argument once for every repetition. We also have to flatten the result, because replicate() by default returns a matrix, and we want a straight vector:

    set.seed(1);
    data.list <- lapply(1:2, function(x) {
        nrep <- 2;
        time <- rep(seq(90,54000,by=90),times=nrep);
        Mx <- c(replicate(nrep,sort(runif(600,-0.014,0.012))));
        My <- c(replicate(nrep,sort(runif(600,-0.02,0.02))));
        Mz <- c(replicate(nrep,sort(runif(600,-1,1))));
        data.frame(time,Mx,My,Mz,set_nbr=x);
    });
    lapply(data.list,function(x) x[c(1:6,601:606),]);
    ## [[1]]
    ##     time          Mx          My         Mz set_nbr
    ## 1     90 -0.01395224 -0.01993431 -0.9988590       1
    ## 2    180 -0.01394975 -0.01986782 -0.9948254       1
    ## 3    270 -0.01378866 -0.01981143 -0.9943576       1
    ## 4    360 -0.01371306 -0.01970813 -0.9789037       1
    ## 5    450 -0.01371011 -0.01970022 -0.9697986       1
    ## 6    540 -0.01365998 -0.01969326 -0.9659567       1
    ## 601   90 -0.01396582 -0.01997579 -0.9970438       1
    ## 602  180 -0.01394750 -0.01997375 -0.9931286       1
    ## 603  270 -0.01387607 -0.01995893 -0.9923425       1
    ## 604  360 -0.01385108 -0.01994546 -0.9913645       1
    ## 605  450 -0.01375113 -0.01976155 -0.9892001       1
    ## 606  540 -0.01374467 -0.01973125 -0.9889356       1
    ##
    ## [[2]]
    ##     time          Mx          My         Mz set_nbr
    ## 1     90 -0.01396979 -0.01999198 -0.9960861       2
    ## 2    180 -0.01390373 -0.01995219 -0.9945237       2
    ## 3    270 -0.01390252 -0.01991559 -0.9925640       2
    ## 4    360 -0.01388905 -0.01978123 -0.9890171       2
    ## 5    450 -0.01386718 -0.01967644 -0.9835435       2
    ## 6    540 -0.01384351 -0.01958008 -0.9822988       2
    ## 601   90 -0.01396739 -0.01989328 -0.9971255       2
    ## 602  180 -0.01396433 -0.01985785 -0.9954987       2
    ## 603  270 -0.01390700 -0.01984074 -0.9903196       2
    ## 604  360 -0.01376890 -0.01982715 -0.9902251       2
    ## 605  450 -0.01366110 -0.01979802 -0.9829480       2
    ## 606  540 -0.01364868 -0.01977278 -0.9812671       2