Search code examples
phpgzipfseek

php gzseek seems beyond size of file


I have encountered a strange problem off and on over the last two years when trying to establish the size of files, in particular gz zip compressed files. I have found workarounds but they are not ideal. The problem is that gzseek() seems to always seek up to approx. 2.14GB file size regardless of the size of the uncompressed file. When testing I have established the uncompressed file size by 1) unzipping and saving as text, and 2) using gzread() to read 1MB at the time until end of file. Lets say that the uncompressed file size is 13MB.

Test code with gzseek() and gztell(). This will advance the handle 1mb / 1000000 bytes but always continue up to approx. 2.14GB regardless of uncompressed file size:

//GZ file is opened ....

gzseek($Handle, 0, SEEK_SET);
while (true) {
  //Seek through file advancing offset with 1000000 bytes each time
  $Eof  = gzseek($Handle, 1000000, SEEK_CUR);  //0 or -1 if passed eof

  //This will dump the handle position incrementing 1000000 bytes at the time but continue until
  //approx. 2.14 GB even through file is 13MB uncompressed
  var_dump(gztell($Handle)); 

  //When the handle (via gztell() ) shows 2.14GB, the gzseek() returns -1 which means it 
  //has reached / gone pas end of file
  if ( $Eof !== -1 ) {
     //This will only be true once the gztell() shows approx. 2.14GB
     break;
  } 

}

Now if instead using gzread() it will work fine, the handle is advanced 1mb/1000000 bytes until 13mb. E.g.:

while ( !gzeof($Handle) ) {
   $Data = gzread($Handle, 1000000);
}

Having researched this a lot in the last years, I have never found a working solution for measuring file size of gz files, not any reports on why it can't be done with gzseek, which I find a bit strange. Either gzseek doesn't work and I would expect to have found that reported or I am really missing something here. Thanks for the help, Chris


Solution

  • What you are missing is that, just like fseek(), gzseek() can and will set the read or write pointer to wherever you ask, including beyond the end of the file. Those functions do not check for the end of file. In fact, fseek() clears the end-of-file flag, to allow reading a growing file.

    Only if you do a read after the seek will it determine if you are at or past the end of file.

    Contrary to your claim, you have in fact found a working solution for determining the uncompressed size of a gzip file, which is to use gzread().