I am trying to add some vales into my data frame using rep()
to split the data in 2, although my issue here is that the rows in my data are not divisible my my each argument.
Here is some data I am using, I wish to add the dates 2025-02-20 and 2025-02-19 to anyone who is aged 40 and above in the dateCompleted column. This works fine for any when the number of people aged 40 and over is divisible by 2 but as I am now trying to split 5 rows by 2 this isn't an integer. I think to do this I need to add a fake row so that my data is divisible by 2?
myDf = data.frame(names=c("john","ben","will","steve","david","harry","bill"), age=c(40,34,42,55,43,30,48), dateCompleted = NA)
This is how I am trying to add
myDf$dateCompleted[which(myDf$age>=40)] = rep(c("2025-02-20", "2025-02-19"), each = length(myDf[which(myDf$age>=40),])/2)
Example of answer when my data is divisible by 2;
names age dateCompleted
john 40 2025-02-20
will 42 2025-02-20
steve 55 2025-02-20
david 43 2025-02-19
bill 48 2025-02-19
dummy row 40 2025-02-19
You can replace "each" with "times" (or simply omit that argument) and remove the division by 2. R will use recycling, but the replacements will be alternating.
myDf$dateCompleted[which(myDf$age>=40)] <- rep(c("2025-02-20", "2025-02-19"),
times=length(myDf[which(myDf$age>=40),]))
myDf
names age dateCompleted
1 john 40 2025-02-20
2 ben 34 <NA>
3 will 42 2025-02-19
4 steve 55 2025-02-20
5 david 43 2025-02-19
6 harry 30 <NA>
7 bill 48 2025-02-20
#Warning message:
#In myDf$dateCompleted[which(myDf$age >= 40)] = rep(c("2025-02-20", :
# number of items to replace is not a multiple of replacement length
Update.
You can use indexing if you don't like the warning.
myDf$dateCompleted[which(myDf$age>=40)] <- c("2025-02-20", "2025-02-19")[
rep(c(1,2), length.out=sum(myDf$age>=40, na.rm=TRUE))]
myDf
names age dateCompleted
1 john 40 2025-02-20
2 ben 34 <NA>
3 will 42 2025-02-19
4 steve 55 2025-02-20
5 david 43 2025-02-19
6 harry 30 <NA>
7 bill 48 2025-02-20
Or simply:
myDf$dateCompleted[which(myDf$age>=40)] <- rep(c("2025-02-20", "2025-02-19"),
length.out=sum(myDf$age>=40, na.rm=TRUE))