Search code examples
rdataset

Use replicate to create new variable


I have the following code:

Ni <- 133     # number of individuals 
MXmeas <- 10   # number of measurements

# simulate number of observations for each individual
Nmeas <- round(runif(Ni, 1, MXmeas))
 
# simulate observation moments (under the assumption that everybody has at least one observation)
obs <- unlist(sapply(Nmeas, function(x) c(1, sort(sample(2:MXmeas, x-1, replace = FALSE)))))
 
# set up dataframe (id, observations)
dat <- data.frame(ID = rep(1:Ni, times = Nmeas), observations = obs)

This results in the following output:

ID observations
1             1
1             3
1             4
1             5
1             6
1             8

However, I also want a variable 'times' to indicate how many times of measurement there were for each individual. But since every ID has a different length, I am not sure how to implement this. This anybody know how to include that? I want it to look like this:

ID observations times
1             1     1
1             3     2
1             4     3
1             5     4
1             6     5
1             8     6

Solution

  • Using dplyr you could group by ID and use the row number for times:

    library(dplyr)
    
    dat |>
      group_by(ID) |>
      mutate(times = row_number()) |>
      ungroup()
    

    With base we could create the sequence based on each of the lengths of the ID variable:

    dat$times <- sequence(rle(dat$ID)$lengths)
    

    Output:

    # A tibble: 734 × 3
          ID observations times
       <int>        <dbl> <int>
     1     1            1     1
     2     1            3     2
     3     1            9     3
     4     2            1     1
     5     2            5     2
     6     2            6     3
     7     2            8     4
     8     3            1     1
     9     3            2     2
    10     3            5     3
    # … with 724 more rows
    

    Data (using a seed):

    set.seed(1)
    Ni <- 133     # number of individuals 
    MXmeas <- 10   # number of measurements
    
    # simulate number of observations for each individual
    Nmeas <- round(runif(Ni, 1, MXmeas))
    
    # simulate observation moments (under the assumption that everybody has at least one observation)
    obs <- unlist(sapply(Nmeas, function(x) c(1, sort(sample(2:MXmeas, x-1, replace = FALSE)))))
    
    # set up dataframe (id, observations)
    dat <- data.frame(ID = rep(1:Ni, times = Nmeas), observations = obs)