With the following dataframe:
indiv1 <- c('ID1','ID45','ID85','ID41','ID70','ID32','ID21','ID26')
indiv2 <- c('ID12',0,'ID3',0,'ID10','ID8',0,0)
df <- data.frame(indiv1,indiv2)
> df
indiv1 indiv2
1 ID1 ID12
2 ID45 0
3 ID85 ID3
4 ID41 0
5 ID70 ID10
6 ID32 ID8
7 ID21 0
8 ID26 0
I would like to add a column V3
to assign a vector c(1,2,3)
where indiv2==0
in repetition if the length of indiv2==0
is bigger than the length of my vector.
I tried with the rep function :
df$V3 <- ifelse(df$indiv2==0,rep(1:3,length.out=dim(df[df$indiv2==0,])[1]),0)
> df
indiv1 indiv2 V3
1 ID1 ID12 0
2 ID45 0 2
3 ID85 ID3 0
4 ID41 0 1
5 ID70 ID10 0
6 ID32 ID8 0
7 ID21 0 3
8 ID26 0 1
But it counts the rows where indiv2!=0
to continue the vector where as I would like :
> df
indiv1 indiv2 V3
1 ID1 ID12 0
2 ID45 0 1
3 ID85 ID3 0
4 ID41 0 2
5 ID70 ID10 0
6 ID32 ID8 0
7 ID21 0 3
8 ID26 0 1
We can use data.table
to do this. Convert the 'data.frame' to 'data.table' (setDT(df)
), then specify the logical condition in 'i' (indiv2 == 0
), we replicate 1:3 with length.out
as the number of rows (.N
) and assign (:=
) it to 'V3', later we replace the NA elements with 0.
library(data.table)
setDT(df)[indiv2==0, V3 := rep(1:3, length.out= .N)][is.na(V3), V3 := 0]
df
# indiv1 indiv2 V3
#1: ID1 ID12 0
#2: ID45 0 1
#3: ID85 ID3 0
#4: ID41 0 2
#5: ID70 ID10 0
#6: ID32 ID8 0
#7: ID21 0 3
#8: ID26 0 1
If we are using base R
, create a logical vector
i1 <- df$indiv2 == 0
then create the 'V3' column based on 'i1'
df$V3[i1] <- rep(1:3, length.out = sum(i1))
and replace the NA to 0
df$V3[is.na(df$V3)] <- 0
df$V3
#[1] 0 1 0 2 0 0 3 1
Using ifelse
requires length of the 'yes' and 'no' arguments to be the same. Here, we are doing recycling based on rep
and it may not work well