Search code examples
rloopsfor-looplapplywrite.table

Using lapply to apply a function over read-in list of files and saving output as new list of files


I'm quite new at R and a bit stuck on what I feel is likely a common operation to do. I have a number of files (57 with ~1.5 billion rows cumulatively by 6 columns) that I need to perform basic functions on. I'm able to read these files in and perform the calculations I need no problem but I'm tripping up in the final output. I envision the function working on 1 file at a time, outputting the worked file and moving onto the next.

After calculations I would like to output 57 new .txt files named after the file the input data first came from. So far I'm able to perform the calculations on smaller test datasets and spit out 1 appended .txt file but this isn't what I want as a final output.

#list filenames 
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)

#begin looping process
loop_output = lapply(files, 
function(x) {

#Load 'x' file in
DF<- read.table(x, header = FALSE, sep= "\t")

#Call calculated height average a name
R_ref= 1647.038203

#Add column names to .las data
colnames(DF) <- c("X","Y","Z","I","A","FC")

#Calculate return
DF$R_calc <- (R_ref - DF$Z)/cos(DF$A*pi/180)

#Calculate intensity
DF$Ir_calc <- DF$I * (DF$R_calc^2/R_ref^2)

#Output new .txt with calcuated columns
write.table(DF, file=, row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")

})

My latest code endeavors have been to mess around with the intial lapply/sapply function as so:

#begin looping process
loop_output = sapply(names(files), 
function(x) {

As well as the output line:

#Output new .csv with calcuated columns 
write.table(DF, file=paste0(names(DF), "txt", sep="."),
row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")

From what I've been reading the file naming function during write.table output may be one of the keys I don't have fully aligned yet with the rest of the script. I've been viewing a lot of other asked questions that I felt were applicable:

Using lapply to apply a function over list of data frames and saving output to files with different names

Write list of data.frames to separate CSV files with lapply

to no luck. I deeply appreciate any insights or paths towards the right direction on inputting x number of files, performing the same function on each, then outputting the same x number of files. Thank you.


Solution

  • The reason the output is directed to the same file is probably that file = paste0(names(DF), "txt", sep=".") returns the same value for every iteration. That is, DF must have the same column names in every iteration, therefore names(DF) will be the same, and paste0(names(DF), "txt", sep=".") will be the same. Along with the append = TRUE option the result is that all output is written to the same file.

    Inside the anonymous function, x is the name of the input file. Instead of using names(DF) as a basis for the output file name you could do some transformation of this character string.

    example.

    Given

    x <- "/foo/raw_data.csv"
    

    Inside the function you could do something like this

    infile <- x
    outfile <- file.path(dirname(infile), gsub('raw', 'clean', basename(infile)))
    
    outfile
    [1] "/foo/clean_data.csv"
    

    Then use the new name for output, with append = FALSE (unless you need it to be true)

    write.table(DF, file = outfile, row.names = FALSE, col.names = FALSE, append = FALSE, fileEncoding = "UTF-8")