I am getting an argument is of length zero
error that I can't figure out the cause of.
This code is supposed to sample from a range of numbers start
and end
. Some of the rows have additional dependencies, where certain IDs need to come after other IDs. The code checks for these dependencies by replacing values in columns after1
and after2
with the corresponding sampled values. If the required dependencies are not met, the values are resampled.
When the code works sucessfully, the sampled
value should be filled with numbers that meet the required dependencies. The code is run for i
iterations, adding a column at the end signifying what run the sampled values correspond to.
I recently adapted the way that I cleaned and prepared my data before entering it into R. I thought that I had transposed the code correctly, but I am getting the following error that wasn't present in the old version, and I'm not sure how to fix it. I've looked around at other posts, but haven't found a solution that is applicable.
Error in if (is.na(filter(dftemp, ID == dftemp[j, k])[6])) { :
argument is of length zero
In addition: Warning message:
In as.integer(dftemp[j, k]) : NAs introduced by coercion
Here is the code I am currently working on:
df <- read_csv("sampling sample set.csv", na = c("#VALUE!", "#N/A", ""))
dftemp <- df
dftemp %>% mutate_if(is.factor, as.character) -> dftemp #change factors to characters
for (i in 1:200){ #determines how many iterations to run
row_list<-as.list(1:nrow(dftemp))
q<-0
while(length(row_list)!=0 & q<10){
q<-q+1
for(j in row_list){ #this loop replaces the check values
skip_flag<-FALSE #initialize skip flag used to check the replacement sampling
for(k in 4:5){ #checking the after columns
if(is.na(dftemp[j,k])){
print("NA break")
print(i)
break
} else if(is.na(as.integer(dftemp[j,k]))==FALSE) { #if it's already an integer, we already did this, next
print("integer next")
next
print("integer next")
} else if(dftemp[j,k]==""){ #check for blank values
print("empty string next")
dftemp[j,k]<-NA #if blank value found, replace with NA
print("fixed blank to NA")
next
} else if(is.na(filter(dftemp,ID==dftemp[j,k])[6])) { #if the replacement has not yet been generated, move on, but set flag to jump this to the end
skip_flag<-TRUE
print("skip flag set")
} else {
dftemp[j,k]<-as.integer(filter(dftemp,ID==dftemp[j,k])[6]) #replacing IDs with the sampled dates of those IDs
print("successful check value grab")
} #if-else
} #k for loop
if(skip_flag==FALSE){
row_list<-row_list[row_list!=j]
} else {
next
}
#sampling section
if(skip_flag==FALSE){
dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
dftemp[j,7]<-i #identifying the run number
if(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
print(j)
while(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
} #while
dftemp[j,7]=i
}#if
}
} #j for loop
} #while loop wrapper around j loop
if(i==1){
dftemp2<-dftemp
}else{
dftemp2<-rbind(dftemp2,dftemp)
}#else
#blank out dftemp to prepare for another run
dftemp<-dftemp
dftemp$sampled <- NA
dftemp %>% mutate_if(is.factor, as.character) -> dftemp
}#i for loop
And here is the sample data.
structure(list(ID = c("a123-1", "b123-1", "c123-1", "d123-1",
"e123-1", "f123-1", "g123-1", "h123-1", "i123-1", "j123-1", "k123-1",
"l123-1", "m123-1", "n123-1"), start = c(-5100, -4760, -4930,
-4930, -5380, -5280, -4855, -4855, -4855, -4855, -4855, -4855,
-4810, -4810), end = c(-4760, -4420, -4420, -4420, -5080, -5080,
-4750, -4750, -4750, -4750, -4750, -4750, -4710, -4710), after1 = c(NA,
NA, NA, NA, NA, NA, NA, "g123-1", "g123-1", NA, "j123-1", "j123-1",
NA, NA), after2 = c(NA, NA, NA, NA, NA, NA, NA, NA, "h123-1",
NA, NA, "k123-1", NA, NA), sampled = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), start = structure(list(), class = c("collector_double",
"collector")), end = structure(list(), class = c("collector_double",
"collector")), after1 = structure(list(), class = c("collector_character",
"collector")), after2 = structure(list(), class = c("collector_character",
"collector")), sampled = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
As the error message suggests, the problem is happening at the is.na(filter(dftemp,ID==dftemp[j,k])[6])
line, at a minimum. The issue seems to be with what dplyr's filter
wants as an input. Consider what is returned in the following calls:
#returns a tibble with one value
str(dftemp[8,4])
#returns an empty tibble
filter(dftemp,ID==dftemp[8,4])
#returns True
is.data.frame(filter(dftemp,ID==dftemp[8,4]))
filter
wants the value directly, not a data frame containing the value. Adding as.character
on your subset should resolve this issue. Note this may be happening elsewhere in your code, so there may be other areas where you will need to ensure you have the correct data type. Below is an example:
#replace line in question with the following:
is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])
#testing
if(is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])){print("working")}
#output
[1] "working"