r for-loop if-statement duplicates sampling

r: randomly assigning "1" or "2" in a vector based on double-occurrences in another vector

I constructed the following code below. It shall assign the value "1" or "2" to vector v2, if an element in vector v1 occurs twice, e.g. "A" in vector v1 appears twice, hence in the respective rows, v2 should once read "1" and in the other case "2".

The code works sort of fine, except in some cases, a similar number is assigned to v2, when an element in v1 occurs twice, this should obviously not be the case.

Can anybody help me with the issue? Thanks!

v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,v2)

for (i in 1:length(df1$v1)) {

  if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==3) {

    df1$v2[i] <- sample(c(1,2),1,replace=TRUE)

  } else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==1) {

    df1$v2[i] <- 2

  } else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==2) {

    df1$v2[i] <- 1 

  } else { 

    df1$v2[i] <- 2
  }
}

Solution

I think that I have understood what you require and hopefully the below should do what you want, using dplyr. It will randomly assign integer values from 1 to n, where n is the number of occurrences of a given letter (note this is generalizable from your requirement of 2 occurrences).

library(dplyr)
df1 <- data.frame(v1 = c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K")))

df1 <- df1 %>% 
         group_by(v1) %>% 
         mutate(v2 = case_when(n() > 1 ~ sample(c(1:n()), n(), replace = FALSE), 
                                  TRUE ~ 1L))