I have a data frame with a column "Tag", here with four different levels. I need help to create the "Seq" column, a sequence generated from the "Tag" Column:
df <- data.frame(Tag = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4),
Seq = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3 )
Each "Tag" should be divided into 3 sub-groups defined by "Seq". We need to generate runs of 1, 2, and 3, with a total length of that of each "Tag". Thus, the length of each run of 1, 2, and 3 respectively depends on length of each "Tag".
Note that the length each "Tag" differs. For example, Tag 1 is of length 31, and has a "Seq" 10
times 1
, 10
times 2
, and 11
times 3
.
To begin with, Tag 1 is 31 while tag 2 is 32. Looking at the code below, the first number (1) will always be of lesser length than the next two (2,3). I used a ceiling process to come up with this. There is no clear criteria on what the code should do if the number is eg 31/3.. should it give a length of 10, 10, 11? or even 9, 11,11 will be fine? The code gives a 9, 11, 11 length:
ec=table(Tag)
unlist(mapply(function(x,y)rep(c(1,2,3),c(x,y,y)),ec-2*ceiling(ec/3),ceiling(ec/3)))
To check the outputted results, save the results in a variable.. d=mapply(...
then do sapply(d,table)
.
Hope this will be of help.