Search code examples
rdataframefunctionrows

How do you make a data frame with unequal row lengths?


I have a large data frame, ExprsData, with several numeric and NA values. It looks something like this:

Patient Gene_A Gene_C Gene_D
patient1 12 16 NA
patient2 15 NA 20

My data frame has 15 rows and 14 columns.

I have made a function that is meant to scale and center the values in my data frame:

MyScale <- function (x, scale, center){
  removena <- x[!is.na(x)] #remove the NA values 
  meanofdata <- mean(removena) #calc the mean 
  stdofdata <- sd(removena) #calc the std
  
  
if (scale==TRUE){ #if scale is true
  calcvec <- (removena - meanofdata)/stdofdata 
  return(calcvec)
}else if (center ==TRUE){ #if vec is true 
  centervec <- removena - meanofdata
  return(centervec)
}
} 

I tested out my function by running a a single column of my data frame like this:

MyScale (ExprsData$Gene_C, scale = TRUE, center = TRUE)

It works great!

Next, I want to be able to apply my function to my entire data frame, have it output as a data frame, assign it to an object and then save as a csv.

To do this I tried this:

ExprsDataScaled <- as.data.frame(lapply(ExprsData, function(x) MyScale(x = x, scale = TRUE, center = TRUE)))
write.csv(ExprsDataScaled,"?path//filename.csv", row.names = TRUE)

However, when I try to apply my function to my entire data frame, I get the following error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 14, 15

I understand that I am getting this error message because my columns differ in length. I know this is because in my function, I have it remove the NA values. I need to do this because otherwise I run into a lot of errors when I try to scale and center later in the function.

Is there a way to make a data frame with unequal columns? Is there a way to re-insert "NA" back into my data frame once it has been scaled and centered to avoid this error? Or a way to insert blank cells in some columns so they can all be the same length?


Solution

  • This is a better version of you function that does not remove any NA from your data:

    (However, the function will still trip on non numeric values for x, or in cases where scale and center are both FALSE. But one could ask oneself why a scale function needs a scale yes or no parameter??)

    MyScale <- function (x, scale, center){
      meanofdata <- mean(x, na.rm = TRUE)
      stdofdata <- sd(x, na.rm = TRUE)
      
      if (scale==TRUE){
        calcvec <- (x - meanofdata)/stdofdata 
        return(calcvec)
      }else if (center ==TRUE){
        centervec <- x - meanofdata
        return(centervec)
      }
    }