I have large .csv file separated by tabs, which has strict structure with colClasses = c("integer", "integer", "numeric")
. For some reason, there are number of trash irrelevant character lines, that broke the pattern, that's why I get
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got 'ExecutiveProducers'
How can I ask read.table to continue and just to skip this lines? The file is large, so it's troublesome to perform the task by hand. If it's impossible, should I use scan + for-loop ?
Now I just read everything as characters and then delete irrelevant rows and convert columns back to numeric, which I think not very memory-efficient
If your file fits into memory, you could first read the file, remove unwanted lines and then read those using read.csv
:
lines <- readLines("yourfile")
# remove unwanted lines: select only lines that do not contain
# characters; assuming you have column titles in the first line,
# you want to add those back again; hence the c(1, sel)
sel <- grep("[[:alpha:]]", lines, invert=TRUE)
lines <- lines[c(1,sel)]
# read data from selected lines
con <- textConnection(lines)
data <- read.csv(file=con, [other arguments as normal])