Search code examples
rfunctioncsvr-colnames

Beginner using pipes


I am a beginner and I'm trying to find the most efficient way to change the name of the first column for many CSV files that I will be creating. Once I have created the CSV files, I am loading them into R as follows:

data <- read.csv('filename.csv')

I have used the names() function to do the name change of a single file:

names(data)[1] <- 'Y'

However, I would like to find the most efficient way of combining/piping this name change to read.csv so the same name change is applied to every file when they are opened. I tried to write a 'simple' function to do this:

addName <- function(data) {
  names(data)[1] <- 'Y'
  data
}

However, I do not yet fully understand the syntax for writing a function and I can't get this to work.


Solution

  • Note

    If you were expecting your original addName function to "mutate" an existing object like so

    x <- data.frame(Column_1 = c(1, 2, 3), Column_2 = c("a", "b", "c"))
    
    # Try (unsuccessfully) to change title of "Column_1" to "Y" in x.
    addName(x)
    
    # Print x.
    x
    

    please be aware that R passes by value rather than by reference, so x itself would remain unchanged:

      Column_1 Column_2
    1        1        a
    2        2        b
    3        3        c
    

    Any "mutation" would be achieved by overwriting x with the return value of the function

    x <- addName(x)
    
    # Print x.
    x
    

    in which case x itself would obviously be changed:

      Y Column_2
    1 1        a
    2 2        b
    3 3        c
    

    Answer

    Anyway, here's a solution that compactly incorporates pipes (%>% from the magrittr package) and a custom function. Please note that without the linebreaks and comments, which I have added for clarity, this could be condensed to only a few lines of code.

    # The dplyr package helps with easy renaming, and it includes the magrittr pipe.
    library(dplyr)
    
    # ...
    
    filenames <- c("filename1.csv", "filename2.csv", "filename3.csv")
    
    # A function to take a CSV filename and give back a renamed dataset taken from that file.
    addName <- function(filename) {
      return(# Read in the named file as a data.frame.
             read.csv(file = filename) %>%
               # Take the resulting data.frame, and rename its first column as "Y";
               # quotes are optional, unless the name contains spaces: "My Column"
               # or `My Column` are needed then.
               dplyr::rename(Y = 1))
    }
    
    # Get a list of all the renamed datasets, as taken by addName() from each of the filenames.
    all_files <- sapply(filenames, FUN = addName,
                        # Keep the list structure, in which each element is a
                        # data.frame.
                        simplify = FALSE,
                        # Name each list element by its filename, to help keep track.
                        USE.NAMES = TRUE)
    

    In fact, you could easily rename any columns you desire, all in one fell swoop:

    dplyr::rename(Y = 1, 'X' = 2, "Z" = 3, "Column 4" = 4, `Column 5` = 5)