Search code examples
rreplicationlapply

Create ID over an abnormal sequence


I have the following code that samples 1 row 5 times, 2 rows 5 times, 3 rows 5 times and so on.. After running the lapply and converting it to a dataframe to make comparisons I need a way to alter the ID variable to act as my groups. So rows 1:5 of "want" would be "group 1", rows 6:15 would be "group 2", 16:30 would be "group 3" and so on... These are the groupings because group one only has one replicate of each number in the ID column, group 2 has two replicates, group 3 has 3 replicates and so on.

Code

iris<- iris

select_rows <- 1:4
n_times <- 5
inds <- nrow(iris)

result <- lapply(select_rows, function(x) 
  replicate(n_times, iris[sample(inds, x), ], simplify = FALSE))

want<- bind_rows(result, .id = 'source')
View(want)

Thinking about running an ANOVA on each column for example, the ID column would not provide sufficient groupings of observations. I suppose I could use a combo of ifelse and mutate to manually go through and assign the rows to certain groups, but I hope to avoid this as I will need to do this for several varying dataframes. I also tried the following code to assign groups over a sequence, but realized it wouldn't work because the numbers of observations in each group are not the same:

final<- want %>% mutate(Group = rep(seq(1,ceiling(nrow(want)/5)),each = 5))

Any help would be appreciated.


Solution

  • Use the times argument to rep to get five 1's, ten 2's, fifteen 3's, etc.

    dat$id <- rep(1:3, times=1:3*5)