Search code examples
rloopsfor-loopif-statementsampling

Rejection sampling loop producing a "length zero" error in R


I am getting an argument is of length zero error that I can't figure out the cause of.

This code is supposed to sample from a range of numbers start and end. Some of the rows have additional dependencies, where certain IDs need to come after other IDs. The code checks for these dependencies by replacing values in columns after1 and after2 with the corresponding sampled values. If the required dependencies are not met, the values are resampled.

When the code works sucessfully, the sampled value should be filled with numbers that meet the required dependencies. The code is run for i iterations, adding a column at the end signifying what run the sampled values correspond to.

I recently adapted the way that I cleaned and prepared my data before entering it into R. I thought that I had transposed the code correctly, but I am getting the following error that wasn't present in the old version, and I'm not sure how to fix it. I've looked around at other posts, but haven't found a solution that is applicable.

Error in if (is.na(filter(dftemp, ID == dftemp[j, k])[6])) { : 
  argument is of length zero
In addition: Warning message:
In as.integer(dftemp[j, k]) : NAs introduced by coercion

Here is the code I am currently working on:

df <- read_csv("sampling sample set.csv", na = c("#VALUE!", "#N/A", ""))
dftemp <- df
dftemp %>% mutate_if(is.factor, as.character) -> dftemp #change factors to characters

for (i in 1:200){ #determines how many iterations to run

  row_list<-as.list(1:nrow(dftemp))
  q<-0

  while(length(row_list)!=0 & q<10){
    q<-q+1 
    for(j in row_list){ #this loop replaces the check values
      skip_flag<-FALSE #initialize skip flag used to check the replacement sampling
      for(k in 4:5){ #checking the after columns
        if(is.na(dftemp[j,k])){ 
          print("NA break")
          print(i)
          break
        } else if(is.na(as.integer(dftemp[j,k]))==FALSE) { #if it's already an integer, we already did this, next
          print("integer next")
          next
          print("integer next")
        } else if(dftemp[j,k]==""){ #check for blank values
          print("empty string next")
          dftemp[j,k]<-NA #if blank value found, replace with NA
          print("fixed blank to NA")
          next 
        } else if(is.na(filter(dftemp,ID==dftemp[j,k])[6])) { #if the replacement has not yet been generated, move on, but set flag to jump this to the end
          skip_flag<-TRUE
          print("skip flag set")
        } else {
          dftemp[j,k]<-as.integer(filter(dftemp,ID==dftemp[j,k])[6]) #replacing IDs with the sampled dates of those IDs
          print("successful check value grab")
        } #if-else
      } #k for loop
      if(skip_flag==FALSE){
        row_list<-row_list[row_list!=j]
      } else {
        next 
      }
      #sampling section
      if(skip_flag==FALSE){
        dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
        dftemp[j,7]<-i #identifying the run number

        if(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
          print(j)
          while(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
            dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
          } #while 
          dftemp[j,7]=i 
        }#if
      }
    } #j for loop
  } #while loop wrapper around j loop
  if(i==1){
    dftemp2<-dftemp
  }else{
    dftemp2<-rbind(dftemp2,dftemp)
  }#else

  #blank out dftemp to prepare for another run
  dftemp<-dftemp
  dftemp$sampled <- NA 
  dftemp %>% mutate_if(is.factor, as.character) -> dftemp 

}#i for loop

And here is the sample data.

structure(list(ID = c("a123-1", "b123-1", "c123-1", "d123-1", 
"e123-1", "f123-1", "g123-1", "h123-1", "i123-1", "j123-1", "k123-1", 
"l123-1", "m123-1", "n123-1"), start = c(-5100, -4760, -4930, 
-4930, -5380, -5280, -4855, -4855, -4855, -4855, -4855, -4855, 
-4810, -4810), end = c(-4760, -4420, -4420, -4420, -5080, -5080, 
-4750, -4750, -4750, -4750, -4750, -4750, -4710, -4710), after1 = c(NA, 
NA, NA, NA, NA, NA, NA, "g123-1", "g123-1", NA, "j123-1", "j123-1", 
NA, NA), after2 = c(NA, NA, NA, NA, NA, NA, NA, NA, "h123-1", 
NA, NA, "k123-1", NA, NA), sampled = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_character", 
    "collector")), start = structure(list(), class = c("collector_double", 
    "collector")), end = structure(list(), class = c("collector_double", 
    "collector")), after1 = structure(list(), class = c("collector_character", 
    "collector")), after2 = structure(list(), class = c("collector_character", 
    "collector")), sampled = structure(list(), class = c("collector_logical", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

Solution

  • As the error message suggests, the problem is happening at the is.na(filter(dftemp,ID==dftemp[j,k])[6]) line, at a minimum. The issue seems to be with what dplyr's filter wants as an input. Consider what is returned in the following calls:

    #returns a tibble with one value
    str(dftemp[8,4])
    
    #returns an empty tibble
    filter(dftemp,ID==dftemp[8,4])
    
    #returns True
    is.data.frame(filter(dftemp,ID==dftemp[8,4]))
    

    filter wants the value directly, not a data frame containing the value. Adding as.character on your subset should resolve this issue. Note this may be happening elsewhere in your code, so there may be other areas where you will need to ensure you have the correct data type. Below is an example:

     #replace line in question with the following:
     is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])
    
    #testing
    if(is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])){print("working")}
    
    #output
    [1] "working"