Search code examples
phpparsingstrpos

parser use mb_strpos and substr


I have two files:
1: template.html (utf-8 encoding) content:

<tag>
<output>
</output>
</tag>

2: and second file is parser.php (utf-8 encoding) content:

$fileContent = (file_get_contents('template.html'));

echo 'Pos #1: <b>'.$pos1 = mb_strpos($fileContent, '<'); echo '</b><br />';
echo 'Pos #2: <b>'.$pos2 = mb_strpos($fileContent, '>'); echo '</b><br />';
echo 'Substring by Pos1 & Pos2: <b>'.htmlentities(substr($fileContent, $pos1, $pos2)).'</b>';

I try to parse the tags and i need to know their correct position.. When I use substr I notice problem the output is:

Pos #1: 0
Pos #2: 10
Substring by Pos1 & Pos2: <tag

I need the correct way.. The result is supposed to be:

Pos #1: 0
Pos #2: 11
Substring by Pos1 & Pos2: <tag>

Solution

  • Extracting a substring takes a start, which is a position and a length which is not a position.

    You can get the length by doing:

    $length = $pos2 - $pos1 + 1;
    

    Also, you are processing a unicode string and have the clarity of mind to use mb_strpos yet you still use substr to extract the substring. You should use mb_substr.

    mb_substr()

    Performs a multi-byte safe substr() operation based on number of characters. Position is counted from the beginning of str. First character's position is 0. Second character position is 1, and so on.

    http://php.net/manual/en/function.mb-substr.php