Search code examples
phpfopenfseekfgetc

PHP get multiple specific line using fseek


I'm trying to get using only fopen() and fseek() to get specific lines of code (not only one lines, i need to get line above and below of current seek line).

To improve performance, I know how to get specific line to seek and then exit. If I need line 5 then should be get seekable into 4 and 6.

Here is a code to get bytes of each lines then put into array as lines as key and value as bytes to EOF.

$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);

if (!$meta['seekable']) {
    throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}

$line = fgets($fh, 4096);
$pos = -1;
$i = 0;

$result = null;

$linenum = 10;
var_dump('Line num:'.$linenum);

$total_lines = null;

// Get seek byte end of each line
while (!feof($fh)) {
    $char = fgetc($fh);

    if ($char != "\n" && $char != "\r") {        
        $total_lines[$i] = $pos;

        $pos++;
    } else {
        $i++;
    }    
    //var_dump(fgets($fh).' _ '.$pos);
}

// Now get specific lines (line 5, line 6 and line 7)
$seekssearch = array($total_lines[5], $total_lines[6], $total_lines[7]);

$result = null;
$posr = 0;
foreach ($seekssearch as $sk) {

    while (!feof($fh)) {

        if ($char != "\n" && $char != "\r") {

        fseek($fh, $sk, SEEK_SET);

        $posr++;

        } else {
        $ir++;


        }
    }

    // Merge result of line 5,6 and 7
    $result .= fgets($fh);    
}

echo $result;






exit;


while (!feof($fh) && $i<($linenum)) {
            $char = fgetc($fh);

            if ($char != "\n" && $char != "\r") {
                fseek($fh, $pos, SEEK_SET);
                $pos++;

            }
            else {
                $i++;
            }
        }
        $line = trim(fgets($fh));

        var_dump($line);






exit;




exit;

while (!feof($fh) && $i<($linenum-1)) {
    $char = fgetc($fh);



    if ($char != "\n" && $char != "\r") {
        //fseek($fh, $pos);
        fseek($fh, $pos);
        $pos++;
    }
    else {

        if ($pos == 3) {

            $line = fgets($fh);
        }

        $i++;


    }
}

//$line = fgets($fh);
var_dump($line); exit;

How to merge this lines?

Note: I don't want using splFileInfo or any tricks like arrays. Just want to seek then exit.


Solution

  • I've created a function that read a file and count lines and store into arrays each lines bytes to seek. If maximum specified by linenum is set, it will break from while to keep performance than in a new loop function to seek a position in bytes to get a content of file.

    I believe that can this function improve.

    function readFileSeek($source, $linenum = 0, $range = 0)
    {
        $fh = fopen($source, 'r');
        $meta = stream_get_meta_data($fh);
    
        if (!$meta['seekable']) {
            throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
        }
    
        $pos = 2;
        $result = null;
    
        if ($linenum) {
            $minline = $linenum - $range - 1;
            $maxline = $minline+$range+$range;
        }
    
        $totalLines = 0;
        while (!feof($fh)) {
    
            $char = fgetc($fh);
    
            if ($char == "\n" || $char == "\r") {
                ++$totalLines;
            } else {
                $result[$totalLines] = $pos;   
            }
            $pos++;
    
            if ($maxline+1 == $totalLines) {
                // break from while to not read entire file
                break;
            }
        }
    
        $buffer = '';
    
        for ($nr=$minline; $nr<=$maxline; $nr++) {
    
            if (isset($result[$nr])) {
    
                fseek($fh, $result[$nr], SEEK_SET);
    
                while (!feof($fh)) {
                    $char = fgetc($fh);
    
                    if ($char == "\n" || $char == "\r") {
                        $buffer .= $char;
                        break;
                    } else {
                        $buffer .= $char;
                    }
                }
    
            }
        }
    
        return $buffer;
    }
    

    Test results (1.3 GB file, 100000000 lines of codes, seek to 300000 line a code):

    string(55) "299998_abc
    299999_abc
    300000_abc
    300001_abc
    300002_abc
    "
    
    
    Time: 612 ms, Memory: 20.00Mb
    
    $ ll -h /tmp/testReadSourceLines_27151460344/41340913936
    -rw-rw-r-- 1  1,3G /tmp/testReadSourceLines_27151460344/41340913936