Search code examples
perlgziptie

Perl reading only specific gz file lines


I'm trying to make a parsing script that parses a huge text file (2 million+ lines) that is gunzip compressed. I only want to parse a range of lines in the text file. So far I've used zgrep -n to find the two lines that mentions the string that I know will start and end the section of the file I'm interested.

In my test case file I am interested in only reading in lines 123080 to 139361. I've found Tie::File to access the file lines using the array object it returns, but unfortunately this won't work for the gun zipped file I'm working with.

Is there something like the following for a gunzipped file?

use Tie::File
tie @fileLinesArray, 'Tie::File', "hugeFile.txt.gz"
my $startLine = 123080;

my $endLine = 139361;    
my $lineCount = $startLine;
while ($lineCount <= $endLine){
    my $line = @fileLinesArray[$lineCount]
    blah blah...
}

Solution

  • Use IO::Uncompress::Gunzip which is a core module:

    use IO::Uncompress::Gunzip;
    
    my $z = IO::Uncompress::Gunzip->new('file.gz');
    $z->getline for 1 .. $start_line - 1;
    for ($start_line .. $end_line) {
        my $line = $z->getline;
        ...
    }
    

    Tie::File gets very slow and memory hungry when processing large files.