Search code examples
r

Using rep() to create a sequence when the rows in data frame are not divisible my the replacement length


I am trying to add some vales into my data frame using rep() to split the data in 2, although my issue here is that the rows in my data are not divisible my my each argument.

Here is some data I am using, I wish to add the dates 2025-02-20 and 2025-02-19 to anyone who is aged 40 and above in the dateCompleted column. This works fine for any when the number of people aged 40 and over is divisible by 2 but as I am now trying to split 5 rows by 2 this isn't an integer. I think to do this I need to add a fake row so that my data is divisible by 2?

myDf = data.frame(names=c("john","ben","will","steve","david","harry","bill"), age=c(40,34,42,55,43,30,48), dateCompleted = NA)

This is how I am trying to add

myDf$dateCompleted[which(myDf$age>=40)] = rep(c("2025-02-20", "2025-02-19"), each = length(myDf[which(myDf$age>=40),])/2)

Example of answer when my data is divisible by 2;

 names    age dateCompleted
john      40    2025-02-20
will      42    2025-02-20
steve     55    2025-02-20
david     43    2025-02-19
bill      48    2025-02-19
dummy row 40    2025-02-19

Solution

  • You can replace "each" with "times" (or simply omit that argument) and remove the division by 2. R will use recycling, but the replacements will be alternating.

    myDf$dateCompleted[which(myDf$age>=40)] <- rep(c("2025-02-20", "2025-02-19"), 
               times=length(myDf[which(myDf$age>=40),]))
    
    myDf
      names age dateCompleted
    1  john  40    2025-02-20
    2   ben  34          <NA>
    3  will  42    2025-02-19
    4 steve  55    2025-02-20
    5 david  43    2025-02-19
    6 harry  30          <NA>
    7  bill  48    2025-02-20
    

    #Warning message:
    #In myDf$dateCompleted[which(myDf$age >= 40)] = rep(c("2025-02-20",  :
    #  number of items to replace is not a multiple of replacement length
    

    Update.

    You can use indexing if you don't like the warning.

    myDf$dateCompleted[which(myDf$age>=40)] <- c("2025-02-20", "2025-02-19")[
         rep(c(1,2), length.out=sum(myDf$age>=40, na.rm=TRUE))]
    
    myDf
      names age dateCompleted
    1  john  40    2025-02-20
    2   ben  34          <NA>
    3  will  42    2025-02-19
    4 steve  55    2025-02-20
    5 david  43    2025-02-19
    6 harry  30          <NA>
    7  bill  48    2025-02-20
    

    Or simply:

    myDf$dateCompleted[which(myDf$age>=40)] <- rep(c("2025-02-20", "2025-02-19"), 
             length.out=sum(myDf$age>=40, na.rm=TRUE))