Search code examples
rfor-loopvectordirectory-structure

Undefined Columns Selected v. duplicate 'row.names' are not allowed


Within a for loop, I am trying to run a function between two columns of data in my data frame, and move to another data set every interation of the loop. I would like to output every output of the for loop into one vector of answers.

I can't get passed the following errors (listed below my code), depending on if I add or remove row.names = NULL to data <- read.csv... part of the following code (line 4 of the for-loop):

** Edited to include directory references, where the error ultimately was:

corr <- function(directory, threshold = 0) {
  source("complete.R")

The above code/ my unseen directory organzation was where my error was

  lookup <- complete("specdata")
  setwd(paste0(getwd(),"/",directory,sep=""))
  files <-list.files(full.names="TRUE") #read file names
  len <- length(files)   
  answer2 <- vector("numeric") 
  answer <- vector("numeric")
  dataN <- data.frame()
      for (i in 1:len) {
          if (lookup[i,"nobs"] > threshold){
               # TRUE -> read that file, remove the NA data and add to the overall data frame
               data <- read.csv(file = files[i], header = TRUE, sep = ",")
               #remove incomplete
               dataN <- data[complete.cases(data),]
               #If yes, compute the correlation and assign its results to an intermediate vector.

        answer<-cor(dataN[,"sulfate"],dataN[,"nitrate"])
        answer2 <- c(answer2,answer)
      }
    }

setwd("../") return(answer2) }

1) Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

vs.)

2) Error in [.data.frame(data, , 2:3) : undefined columns selected

What I've tried

  1. referring to the column names directly "colA"
  2. initializing data and dataN to empty data.frames before the for loop
  3. initializing answer2 to an empty vector
  4. Getting an better understanding on how vectors, matrices and data.frames work with each other

** Thank you!**


Solution

  • My problem was that I had the function .R file that I was referencing in the code above, in the same directory as the data files I was looping through and analyzing. My "files" vector was an incorrect length, because it was reading the another .R function I made and referenced earlier in the function. I believe this R file is what created the 'undefined columns'

    I apologize, I ended up not even putting up the right area of code where the problem lay.

    Key Takeaway: You can always move between directories within a function! In fact, it may be very necessary if you want to perform a function on all the contents of a directory of interest