I'm trying to get using only fopen()
and fseek()
to get specific lines of code (not only one lines, i need to get line above and below of current seek line).
To improve performance, I know how to get specific line to seek and then exit. If I need line 5 then should be get seekable into 4 and 6.
Here is a code to get bytes of each lines then put into array as lines as key and value as bytes to EOF
.
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$line = fgets($fh, 4096);
$pos = -1;
$i = 0;
$result = null;
$linenum = 10;
var_dump('Line num:'.$linenum);
$total_lines = null;
// Get seek byte end of each line
while (!feof($fh)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
$total_lines[$i] = $pos;
$pos++;
} else {
$i++;
}
//var_dump(fgets($fh).' _ '.$pos);
}
// Now get specific lines (line 5, line 6 and line 7)
$seekssearch = array($total_lines[5], $total_lines[6], $total_lines[7]);
$result = null;
$posr = 0;
foreach ($seekssearch as $sk) {
while (!feof($fh)) {
if ($char != "\n" && $char != "\r") {
fseek($fh, $sk, SEEK_SET);
$posr++;
} else {
$ir++;
}
}
// Merge result of line 5,6 and 7
$result .= fgets($fh);
}
echo $result;
exit;
while (!feof($fh) && $i<($linenum)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
fseek($fh, $pos, SEEK_SET);
$pos++;
}
else {
$i++;
}
}
$line = trim(fgets($fh));
var_dump($line);
exit;
exit;
while (!feof($fh) && $i<($linenum-1)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
//fseek($fh, $pos);
fseek($fh, $pos);
$pos++;
}
else {
if ($pos == 3) {
$line = fgets($fh);
}
$i++;
}
}
//$line = fgets($fh);
var_dump($line); exit;
How to merge this lines?
Note: I don't want using
splFileInfo
or any tricks like arrays. Just want to seek then exit.
I've created a function that read a file and count lines and store into arrays each lines bytes to seek. If maximum specified by linenum
is set, it will break from while to keep performance than in a new loop function to seek a position in bytes to get a content of file.
I believe that can this function improve.
function readFileSeek($source, $linenum = 0, $range = 0)
{
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$pos = 2;
$result = null;
if ($linenum) {
$minline = $linenum - $range - 1;
$maxline = $minline+$range+$range;
}
$totalLines = 0;
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
++$totalLines;
} else {
$result[$totalLines] = $pos;
}
$pos++;
if ($maxline+1 == $totalLines) {
// break from while to not read entire file
break;
}
}
$buffer = '';
for ($nr=$minline; $nr<=$maxline; $nr++) {
if (isset($result[$nr])) {
fseek($fh, $result[$nr], SEEK_SET);
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
$buffer .= $char;
break;
} else {
$buffer .= $char;
}
}
}
}
return $buffer;
}
Test results (1.3 GB file, 100000000 lines of codes, seek to 300000 line a code):
string(55) "299998_abc
299999_abc
300000_abc
300001_abc
300002_abc
"
Time: 612 ms, Memory: 20.00Mb
$ ll -h /tmp/testReadSourceLines_27151460344/41340913936
-rw-rw-r-- 1 1,3G /tmp/testReadSourceLines_27151460344/41340913936