I have an instrument that produces data files which contain a large amount of header information. I want to read many files in at a time and rbind
them together. To read these in successfully I have been using the following loop and skip to deal with the header information:
df <- c()
for (x in list.files(pattern="*.cnv", recursive=TRUE)) {
u <-read.table(x, skip=100)
df <- rbind(df, u)
}
Here is an example of what the datafile with 5 lines to skip looks like:
# Header information
# Header information
# Header information
# Header information
# Header information
*END*
0.571 26.6331 8.2733 103.145 0.0842 -0.000049 0.000e+00
0.576 26.6316 8.2756 103.171 0.3601 -0.000049 0.000e+00
0.574 26.6322 8.2744 103.157 0.3613 -0.000046 0.000e+00
The issue is that the number of lines to skip is dynamic and I would like to come up with a generalized solution. Fortunately, every file ends with this:
*END*
So my question is, how can I read in a file with the above that skips over all lines before and includes the *END*
line? This likely would take place before rbind
-ing them together.
Read the input line by line using
all_content = readLines("input.txt")
>all_content
[1] "# Header information"
[2] "# Header information"
[3] "# Header information"
[4] "# Header information"
[5] "# Header information"
[6] "*END*"
[7] " 0.571 26.6331 8.2733 103.145 0.0842 -0.000049 0.000e+00"
[8] " 0.576 26.6316 8.2756 103.171 0.3601 -0.000049 0.000e+00"
[9] " 0.574 26.6322 8.2744 103.157 0.3613 -0.000046 0.000e+00"
And remove the lines till you hit *END* using grep
as follow
skip = all_content[-c(1:grep("*END*",all_content))]
Now read using the normal read.table
function as follow
input <- read.table(textConnection(skip))
> input
V1 V2 V3 V4 V5 V6 V7
1 0.571 26.6331 8.2733 103.145 0.0842 -4.9e-05 0
2 0.576 26.6316 8.2756 103.171 0.3601 -4.9e-05 0
3 0.574 26.6322 8.2744 103.157 0.3613 -4.6e-05 0
You get the desired result.
UPDATE
In your loop just use
for (x in list.files(pattern="*.cnv", recursive=TRUE)) {
all_content <- readLines(x)
skip = all_content[-c(1:grep("*END*",all_content))]
input <- read.table(textConnection(skip))
df <- rbind(df, input)
}