Search code examples
phpfilemediafopenfseek

byte position: file_get_contents vs fopen


I need some data from a specific byte in range in a binary file.
(concatenated jpegs, don't ask...)

So I have a offset and length data from an external API.
(I would guess that those are byte positions)

What works is the following:

$fileData = file_get_contents($binaryFile);
$imageData = substr($fileData, $offset, $length);

But i would rather not load the full file into memory and therefor tried fopen:

$handle = fopen($binaryFile, 'rb');
fseek($handle, $offset);
$imageData = fgets($handle, $length);

But that doesn't work. The data chunk is no valid image data.
So i assume i got the position wrong with fopen.

Any ideas on how the positions differ in substr vs fopen?


Solution

  • You wrote

    The data chunk is no valid image data

    "image data" - but in your code you call fgets() to read that data. That's wrong, as image is binary data, not a text file, so you don't want it read it by lines (docs):

    fgets — Gets line from file pointer

    This means fgets() would stop reading from file once it finds what it considers line end marker, which usually means stopping earlier and reading less than $length as there's pretty low chance such byte is not in the binary sequence.

    So fgets() wrong method to use and this is the main issue. Instead you should pick less smart fread() (which does not know about lines and stuff, and just reads what you told). Finally you should fclose() the handle when you done. And naturally you should always check for errors, starting from fopen():

    if ($handle = fopen($binaryFile, 'rb')) {
        if (fseek($handle, $offset) === 0) {
           $imageData = fread($handle, $length);
           if ($imageData === false) {
              // error handling - failed to read the data
           }
        } else {
            // error handling - seek failed
        }
        fclose($handle);
    } else {
       // error handling - can't open file
    }
    

    So always use right tool for the task, and if you are unsure what given method/function does, there's always not-that-bad documentation to peek.