Search code examples
rtext-processingdata-extraction

Error in reading multple text files from directory in R


I would like to read multiple text files from my directory the files are arranged in following format

 regional_vol_GM_atlas1.txt
 regional_vol_GM_atlas2.txt
 ........
 regional_vol_GM_atlas152.txt

Data from the files looks in following format

667869 667869
580083 580083
316133 316133
3631 3631

following is the script that i have written

library(readr)
library(stringr)
library(data.table)

array <- c()  
for (file in dir(/media/dev/Daten/Task1/subject1/t1)) # path to the directory where .txt files are located
  {  

  row4 <- read.table(file=list.files(pattern ="regional_vol*.txt"),
                     header = FALSE,
                     row.names = NULL,
                     skip = 3,  # Skip the 1st 3 rows
                     nrows = 1,  # Read only the next row after skipping the 1st 3 rows
                     sep = "\t")  # change the separator if it is not "\t"  
  array <- cbind(array, row4)
}

I am incurring following error

 Error in file(file, "rt") : invalid 'description' argument

kindly suggest me where i was wrong in the script


Solution

  • This seems to work fine for me. Make changes as per code comments in case files have headers : [Answer Edited to reflect new information posted by OP]

    # rm(list=ls()) #clean memory if you can afford to
    
    mydir<- "~/Desktop/a" #change as per your path
    # read full paths
    myfiles<- list.files(mydir,pattern = "regional_vol*",full.names=T)
    myfiles #check that files listed correctly
    
    # initialise the dataframe from first file 
    # change header =T/F depending on presence of header
    # make sure sep is correct      
    
    df<- read.csv( myfiles[1], header = F, skip = 0, nrows = 4, sep="" )[-c(1:3),]
    #check that first line was read correctly
    df
    #read all the other files and update dataframe
    #we read 4 lines to read the header correctly, then remove 3
    ans<- lapply(myfiles[-1], function(x){  read.csv( x, header = F, skip = 0, nrows = 4, sep="")[-c(1:3),]       })
    ans
    
    
    #update dataframe
    lapply(ans, function(x){df<<-rbind(df,x)}  )
    
    #this should be the required dataframe
    df
    

    Also, if you are on Linux, a much simple method would be to simply make the OS do it for you

    awk 'FNR == 4' regional_vol*.txt