I'm reading and processing a stream of input from the ARGV filehandle in Perl (i.e. the a regular filehandle, which may be STDIN. However, I need to analyze a significant portion of the input in order to detect which of four different but extremely similar formats it is encoded in (different ASCII encodings of FASTQ quality scores; see here). Once I've decided which format the data is in, I need to go back and parse those lines a second time to actually read the data.while(<>)
construct)
So I need to read the first 500 lines or so of the stream twice. Or, to look at it another way, I need to read the first 500 lines, and then "put them back" so I can read them again. Since I may be reading from STDIN, I can't just seek back to the beginning. And the files are huge, so I can't just read everything into memory (although reading those first 500 lines into memory is ok). What's the best way to do this?
Alternatively, can I duplicate the input stream somehow?
Edit: Wait a minute. I just realized that I can't process the input as one big stream anymore, because I have to detect each file's format independently. So I can't use ARGV. The rest of the question still stands, though.
As you said, if the filehandle might be STDIN, you can't use seek
to rewind it. But it's still pretty simple. I wouldn't bother with a module:
my @lines;
while (<$file>) {
push @lines, $_;
last if @lines == 500;
}
... # examine @lines to determine format
while (defined( $_ = @lines ? shift @lines : <$file> )) {
... # process line
}
Remember that you need an explicit defined
in this case, because the special case that adds an implicit defined
to some while
loops doesn't apply to this more complex expression.