phputf-8

Why would anybody want to use strlen instead of mb_strlen


There is some legacy code, which I'm supposed to convert from iso-8859-2 to UTF-8. One of the problems is a wide use of strlen function. I first thought that I will replace all of occurences of strlen to mb_strlen.

However, a colleague of mine said that this would be a mistake. I know the difference between the two functions - in case of accented characters in a string, strlen will return the number of bytes it really takes, while mb_strlen will return the number of characters.

And now, a colleague said that maybe, just maybe somewhere there is a situation where the return needs to be about the number of bytes in the string, but he couldn't give me any examples of such situation.

There are about 900 of strlen occurences in the entire code and it will take days to analyze every single occurence.

The question is - what are the potential situtations when a somebody would need the number of bytes instead of number of characters in a string?


Solution

  • A few situations come to mind:

    • Storing the string in a file or database
    • Writing the string to a socket to send over the network
    • Calling a legacy API or a COM method that requires the length in bytes