How can you loop through a paired-end fastq file? For single end reads you can do the following
library(ShortRead)
strm <- FastqStreamer("./my.fastq.gz")
repeat {
fq <- yield(strm)
if (length(fq) == 0)
break
#do things
writeFasta(fq, 'output.fq', mode="a")
}
However, if I edit one paired-end file, I somehow need to keep track of the second file so that the two files continue to correspond well with each other
Paired-end fastq files are typically ordered,
So you could keep track of the lines that are removed, and remove them from the paired file. But this isn't a great method, and if your data is line-wrapped you will be in pain.
A better way would be to use the header information.
The headers for the paired reads in the two files are identical, except for the field that specifies whether the read is reverse or forward (1 or 2)...
first read from file 1: @M02621:7:000000000-ARATH:1:1101:15643:1043 1:N:0:12
first read from file 2 @M02621:7:000000000-ARATH:1:1101:15643:1043 2:N:0:12
The numbers 1101:15643:1043 refers to the tile and x, y coordinates on that tile, respectively.
These numbers uniquely identify each read pair, for the given run. Using this information, you can removed reads from the second file if they are not in the first file.
Alternatively, if you are doing quality trimming... Trimmomatic can perform quality/length filtering on paired-end data, and it's fast...