Search code examples
rif-statementwhile-loopsapply

Problem in R with conditional application of a function in a while loop


I have a problem with the following code (see below). To the end of the code I plan a while loop (see comment in the code). However, the loop does not work when I test it manually (i.e. run the code several times).

library(dplyr)

# Generate data
df = data.frame(var_a = c('A',NA, NA,'A',NA, NA, NA,NA, NA, NA),
                var_b = c('B','B',NA, NA,NA, NA, NA,NA, NA, NA),
                var_c = c(NA,'C','C',NA,NA, NA, NA,NA, NA, NA),
                var_d = c('D',NA,'D','D',NA, NA, NA,NA, NA, NA),
                var_e = c(NA,'Text',NA,'Text','Text', 'Text', NA,NA, 'Text', NA))


# Function to test if all values in a row from a specified list of variable equal NA
test_all_na <- function(df, na_test_vars){
  df_test <- df %>%
    mutate(all_na = rowSums(is.na(.[na_test_vars])) == length(na_test_vars))
  return(df_test)
}

# Specify variables to test if all values in a row equal NA
na_test_vars <- c("var_a", "var_b", "var_c", "var_d")                

# Generate new variable in df with logical vector if all values equal NA
df <- test_all_na(df, na_test_vars)


# Function to select randomly one value from a list of values based on probabilities
select_value <- function(old_value, prob, list_of_values){
  new_value <- sample(list_of_values, 
                      size = 1,
                      prob = prob)
  return(new_value)
}

# Set condition to change values 
condition <- df$all_na == TRUE

# Count occurances of TRUE in variable all_na
counter = sum(df$all_na)
print(counter)

### HERE SHOULD COME A WHILE LOOP: while counter > 0
# replace values with select_values function in various variables based on the condition
df$var_a <- ifelse(condition, sapply(df$var_a, select_value, prob = c(0.2, 0.8), list_of_values = c('A', NA)), df$var_a)
df$var_b <- ifelse(condition, sapply(df$var_b, select_value, prob = c(0.2, 0.8), list_of_values = c('B', NA)), df$var_b)
df$var_c <- ifelse(condition, sapply(df$var_c, select_value, prob = c(0.2, 0.8), list_of_values = c('C', NA)), df$var_c)
df$var_d <- ifelse(condition, sapply(df$var_d, select_value, prob = c(0.2, 0.8), list_of_values = c('D', NA)), df$var_d)

# Generate again variable in df with logical vector if all values equal NA
df <- test_all_na(df, na_test_vars)

# Count occurances of TRUE in variable all_na
counter = sum(df$all_na)
print(counter)

### END WHILE LOOP

The first time it works. Before the loop, there are 6 lines that are all equal NA in the specified variables. After that, there are only fewer lines (depending on chance). What I would actually expect is that with each run, fewer lines satisfy the condition. But this is not true. There are always different numbers, sometimes more, sometimes less. So the loop goes to infinity. Do you know what I am doing wrong? Thanks for any help!


Solution

  • it seems that you only missed to update your condition. In your code the condition did not update and therefore, the loop didn't recognized the rows that were already completed.

    The code works fine for me like this:

    while(counter){
    df$var_a <- ifelse(condition, sapply(df$var_a, select_value, prob = c(0.2, 0.8), list_of_values = c('A', NA)), df$var_a)
    df$var_b <- ifelse(condition, sapply(df$var_b, select_value, prob = c(0.2, 0.8), list_of_values = c('B', NA)), df$var_b)
    df$var_c <- ifelse(condition, sapply(df$var_c, select_value, prob = c(0.2, 0.8), list_of_values = c('C', NA)), df$var_c)
    df$var_d <- ifelse(condition, sapply(df$var_d, select_value, prob = c(0.2, 0.8), list_of_values = c('D', NA)), df$var_d)
    
    # Generate again variable in df with logical vector if all values equal NA
    df <- df <- test_all_na(df, na_test_vars)
    
    #Update condition 
    condition <- df$all_na == TRUE
    
    # Count occurances of TRUE in variable all_na
    counter = sum(df$all_na)
    print(counter)
    }
    

    Let me know if it's not the answer you were looking for!