Search code examples
rcsvmapply

How to remove rows of multiple .csv files


I am not currently certain of the best way to go about this, so if there are any suggestions regarding a more efficient way, I would appreciate it!

To start, here is some toy data:

data <- data.frame(
  "stim" = c("face", "object", " ", "pareidolia"),
  "RT" = c(23, 24, 22, 25),
  "Opac" = c(70, 60, 80, 65)
)

write.csv(data, "data.csv")

data <- data.frame(
  "stim" = c("face", "pareidolia", " ", "pareidolia"),
  "RT" = c(83, 24,52, 85),
  "Opac" = c(70, 87, 8, 6)
)

write.csv(data, "data.csv")

I am currently reading in multiple .csv files like so:

library("tidyverse")
base :: setwd("filepath")
files <- base::list.files(
  path = ("filepath"), 
  recursive = TRUE,
  pattern = "*.csv"
)

I am then creating a list of dataframes with the .csv file contents and removing rows that correspond to blank cells in the stim column of each dataframe. Then, I am making filepaths from list names.

datalist <- lapply(files, read.csv)

with(datalist, subset(datalist, !("stim" == ""))) -> datalist

file_out <- paste0(names(datalist), ".csv")

Finally, I am attempting to write each dataframe in the list to it's own .csv file.

mapply(
  function(x, y) write_csv(x, y), 
  datalist, 
  file_out
)

The problem is that this code doesn't seem to work. It doesn't output any .csv file except for a single file titled simply ".csv," which when opened, I get an error stating that the file doesn't exist. Is there a better way to go about this process to acheive the desired results? Thank you so much.


Solution

  • First make files to be processed.

    data <- data.frame(
      "stim" = c("face", "object", " ", "pareidolia"),
      "RT" = c(23, 24, 22, 25),
      "Opac" = c(70, 60, 80, 65)
    )
    
    write.csv(data, "data1.csv")
    
    data <- data.frame(
      "stim" = c("face", "pareidolia", " ", "pareidolia"),
      "RT" = c(83, 24,52, 85),
      "Opac" = c(70, 87, 8, 6)
    )
    
    write.csv(data, "data2.csv")
    

    Now the question's problem.

    • get the full filenames with list.files;
    • for each filename, lapply an anonymous function to read the data, remove all unwanted spaces from column stim, subset the data.frame and rewrite it to disk.
    files <- list.files(
      path = "~/Temp",
      pattern = "data.*\\.csv",
      full.names = TRUE
    )
    
    lapply(files,\(x) {
      df1 <- read.csv(x)
      df1$stim <- trimws(df1$stim)
      df1 <- subset(df1, stim != "")
      write.csv(df1, x, row.names = FALSE, quote = FALSE)
    })
    

    Now check that the files were correctly saved.

    lapply(files, read.csv)
    #> [[1]]
    #>   X       stim RT Opac
    #> 1 1       face 23   70
    #> 2 2     object 24   60
    #> 3 4 pareidolia 25   65
    #> 
    #> [[2]]
    #>   X       stim RT Opac
    #> 1 1       face 83   70
    #> 2 2 pareidolia 24   87
    #> 3 4 pareidolia 85    6
    

    Created on 2024-04-17 with reprex v2.1.0