If that's relevant (it very well could be), they are PHP source code files.
There are a few pitfalls to take care of:
<?php header('Content-Type: text/html') ?>
at the beginning of an otherwise empty file doesn't trigger a warning, you're fine.strlen
really returns the number of bytes in the string, not the actual number of characters. This isn't too much of a problem until you start splicing strings of non-ASCII characters with functions like substr
: when you do, indices you pass to it refer to byte indices rather than character indices, and this can cause your script to break non-ASCII characters in two. For instance, echo substr("é", 0, 1)
will return an invalid UTF-8 character because in UTF-8, é
actually takes two bytes and substr will return only the first one. (The solution is to use the mb_
string functions, which are aware of multibyte encodings.)SET CHARACTER SET UTF8
or something along these lines), or if you couldn't find a better way, mb_convert_encoding
or iconv
will convert one string into another encoding.