Search code examples
rdataframevectorassignrecycle

Assign multiple columns via vector without recycling


I am importing measurement data as a dataframe and want to include the experimental conditions in the data which are given in the filename. I want to add new columns to the dataframe that represent the conditions, and I want to assign the columns with the value specified by the filename. Later, this will facilitate comparisons to other experimental conditions once I merge the editted dataframes from each individual sample/file.

Here is an example of my pre-existing dataframe Measurements:

Measurements <- data.frame(
  X = 1:4,
  Length = c(130, 150, 170, 140)
)

Here are the example vectors of variables and values that would be derived from the filename:

FileVars.vec <- c("Condition", "Plant")

FileInfo.vec <- c("aKG", "1")

Here is one way that I have solved how to do what I want:

for (i in 1:length(FileVars.vec)) {
  Measurements[FileVars.vec[i]] <- FileInfo.vec[i]
}

Which gives the desired output:

 X  Length Condition Plant
 1  130    aKG       1  
 2  150    aKG       1  
 3  170    aKG       1  
 4  140    aKG       1

But my (limited) understanding of R is that it is a vectorized language that often avoids the need for using for-loops. I feel like this simpler code should work:

Measurements[FileVars.vec] <- FileInfo.vec

But instead of assigning one value for one entire column, it recycles the values within each column:

X   Length Condition Plant
1   130    aKG       aKG    
2   150    1         1  
3   170    aKG       aKG    
4   140    1         1

Is there any way to do a similar simple assignment but without recycling, i.e. one value is assigned to one full column only? I imagine there's a simple formatting fix but I've searched for a solution for >6 hours and no where did I see an assignment like this. I have also thought of creating a separate dataframe of just the experimental conditions and then merging to the actual dataframe, but that seems more roundabout to me, especially with more experimental conditions and observations than these examples.

Also, if there is a more established pipeline/package for taking information from the filename and adding it to the data in a tidy fashion, that would be marvelous as well! The original filename would be something like:

"aKG_1.csv"

Thank you for helping an R noobie! May you receive good coding karma when debugging!


Solution

  • We can convert to a list and then assign to avoid the recycling of values column wise. As it is a list, each element will be treated as a unit and the assignment occurs for the respectively columns by recycling those elements

    Measurements[FileVars.vec] <-  as.list(FileInfo.vec)
    

    -output

    Measurements
    #  X Length Condition Plant
    #1 1    130       aKG     1
    #2 2    150       aKG     1
    #3 3    170       aKG     1
    #4 4    140       aKG     1
    

    If we want to reset the type, use type.convert

    Measurements <- type.convert(Measurements, as.is = TRUE)
    

    Note that by creating a vector for FileInfo.vec, it will have a single type i.e. character. Instead if we want to have multiple types, it can be a list

    Measurements[FileVars.vec] <- list("akg", 1)
    

    For the second part of the question, if we have a string

    str1 <- "aKG_1.csv"
    

    and wants to create two columns from that, either use, read.csv or strsplit

    Measurements[FileVars.vec] <- read.table(text = tools::file_path_sans_ext(str1),
               sep="_", header = FALSE)