Search code examples
rvectordataframerep

Assign repeated vector in a dataframe to conditional variables in R


With the following dataframe:

indiv1 <- c('ID1','ID45','ID85','ID41','ID70','ID32','ID21','ID26')
indiv2 <- c('ID12',0,'ID3',0,'ID10','ID8',0,0)
df <- data.frame(indiv1,indiv2)

> df
  indiv1 indiv2
1    ID1   ID12
2   ID45      0
3   ID85    ID3
4   ID41      0
5   ID70   ID10
6   ID32    ID8
7   ID21      0
8   ID26      0

I would like to add a column V3 to assign a vector c(1,2,3) where indiv2==0 in repetition if the length of indiv2==0 is bigger than the length of my vector. I tried with the rep function :

df$V3 <- ifelse(df$indiv2==0,rep(1:3,length.out=dim(df[df$indiv2==0,])[1]),0)

> df
  indiv1 indiv2 V3
1    ID1   ID12  0
2   ID45      0  2
3   ID85    ID3  0
4   ID41      0  1
5   ID70   ID10  0
6   ID32    ID8  0
7   ID21      0  3
8   ID26      0  1

But it counts the rows where indiv2!=0 to continue the vector where as I would like :

> df
  indiv1 indiv2 V3
1    ID1   ID12  0
2   ID45      0  1
3   ID85    ID3  0
4   ID41      0  2
5   ID70   ID10  0
6   ID32    ID8  0
7   ID21      0  3
8   ID26      0  1

Solution

  • We can use data.table to do this. Convert the 'data.frame' to 'data.table' (setDT(df)), then specify the logical condition in 'i' (indiv2 == 0), we replicate 1:3 with length.out as the number of rows (.N) and assign (:=) it to 'V3', later we replace the NA elements with 0.

    library(data.table)
    setDT(df)[indiv2==0, V3 := rep(1:3, length.out= .N)][is.na(V3), V3 := 0]
    df
    #   indiv1 indiv2 V3
    #1:    ID1   ID12  0
    #2:   ID45      0  1
    #3:   ID85    ID3  0
    #4:   ID41      0  2
    #5:   ID70   ID10  0
    #6:   ID32    ID8  0
    #7:   ID21      0  3
    #8:   ID26      0  1
    

    If we are using base R, create a logical vector

    i1 <- df$indiv2 == 0
    

    then create the 'V3' column based on 'i1'

    df$V3[i1] <- rep(1:3, length.out = sum(i1))
    

    and replace the NA to 0

    df$V3[is.na(df$V3)] <- 0
    
    df$V3
    #[1] 0 1 0 2 0 0 3 1
    

    Using ifelse requires length of the 'yes' and 'no' arguments to be the same. Here, we are doing recycling based on rep and it may not work well