Search code examples
phputf-8utf8-decode

PHP - Do I need any UTF-8 encoding/decoding?


Ok, I am writing comments to a UTF-8 file that I read within the function below to remove the text in between these comments. My question is, do I need anything different in here to do this successfully for UTF-8 files? Or will the following code below work? Basically, I am wondering if I need utf8_decode and/or utf8_encode functions, or perhaps iconv function?

// This holds the current file we are working on.
$lang_file = 'files/DreamTemplates.russian-utf8.php';

// Can't read from the file if it doesn't exist now can we?
if (!file_exists($lang_file))
    continue;

// This helps to remove the language strings for the template, since the comment is unique
$template_begin_comment = '// ' . ' Template - ' . $lang_file . ' BEGIN...';
$template_end_comment = '// ' . ' Template - ' . $lang_file . ' END!';

$fp = fopen($lang_file, 'rb');
$content = fread($fp, filesize($lang_file));
fclose($fp);

// Searching within the string, extracting only what we need.
$start = strpos($content, $template_begin_comment);
$end = strpos($content, $template_end_comment);

// We can't do this unless both are found.
if ($start !== false && $end !== false)
{
    $begin = substr($content, 0, $start);
    $finish = substr($content, $end + strlen($template_end_comment));

    $new_content = $begin . $finish;

    // Write it into the file.
    $fo = fopen($lang_file, 'wb');
    @fwrite($fo, $new_content);
    fclose($fo);
}

Thanks for your help on this concerning UTF-8 encoding and decoding on strings, even if they are commented strings.

When I write the php comments into the UTF-8 file I am not using any conversion. Should I be?? The string definitions between the php comments is already encoded in UTF-8 however and seems to work fine within the file. Any help appreciated here.


Solution

  • No, you don't need to do any conversions.

    Also, your extraction code will be reliable in the sense that it wont mangle multibyte characters, although you might want to make sure the end position occurs after the start pos.