Search code examples
rlapplyread.table

Specify column classes when reading in data via lapply(FileList, read.xls)


My question is about how to specify the class for various columns when reading in data that come from many files. More specifically, I am uploading 1000s of .xlsx files at a time and converting them to .csv files using the read.xls() function in the gdata package.

My approach is as follows:

Myfiles<-list.files() # lists all files in working directory (which contains data files)
library(gdata)
Mylist <- lapply(Myfiles, read.xls, header=T,
    perl="C:/Users/A/PERL/perl/bin/perl.exe",
    sheet=1,
    method="csv",
    skip=1,
    as.is=1)

I apologize for not providing a workable example. I'm not sure how to do so for this problem.

All the .xlsx files have identical headers and set-up, but the classes of corresponding columns in the data frames within Mylist are not all the same. Is there a way to specify the classes within the lapply() approach I am using? I know you can extend functions of read.table() to read.xls() but I haven't figured out how to specify the column classes properly within the lapply call.


Solution

  • It's all in Gabor's comment, but to put this one to bed:

    lapply(Myfiles, read.xls, colClasses = c("character", "numeric", "factor"), header=T)