Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this:
$string = "Lorem ipsum :-) dolor :-| samet";
$emoticons = array(
'[HAPPY]' => array(' :-) ', ' :) ', ' :o) '), //etc...
'[SAD]' => array(' :-( ', ' :( ', ' :-| ')
);
foreach ($emoticons as $emotion => $icons) {
$string = str_replace($icons, " $emotion ", $string);
}
echo $string;
Output:
Lorem ipsum [HAPPY] dolor [SAD] samet
so in principle this works. However, I have two questions:
As you can see, I'm putting spaces around each emoticon in the array, such as ' :-) ' instead of ':-)' This makes the array less readable in my opinion. Is there a way to store emoticons without the spaces, but still match against $string with spaces around them? (and as efficiently as the code is now?)
Or is there perhaps a way to put the emoticons in one variable, and explode on space to check against $string? Something like
$emoticons = array( '[HAPPY]' => ">:] :-) :) :o) :] :3 :c) :> =] 8) =) :} :^)", '[SAD]' => ":'-( :'( :'-) :')" //etc...
Is str_replace the most efficient way of doing this?
I'm asking because I need to check millions of strings, so I'm looking for the most efficient way to save processing time :)
If the $string, in which you want replace emoticons, is provided by a visitor of your site(I mean it's a user's input like comment or something), then you should not relay that there will be a space before or after the emoticon. Also there are at least couple of emoticons, that are very similar but different, like :-) and :-)). So I think that you will achieve better result if you define your emoticon's array like this:
$emoticons = array(
':-)' => '[HAPPY]',
':)' => '[HAPPY]',
':o)' => '[HAPPY]',
':-(' => '[SAD]',
':(' => '[SAD]',
...
)
And when you fill all find/replace definitions, you should reorder this array in a way, that there will be no chance to replace :-)) with :-). I believe if you sort array values by length will be enough. This is in case your are going to use str_replace(). strtr() will do this sort by length automatically!
If you are concerned about performance, you can check strtr vs str_replace, but I will suggest to make your own testing (you may get different result regarding your $string length and find/replace definitions).
The easiest way will be if your "find definitions" doesn't contain trailing spaces:
$string = strtr( $string, $emoticons );
$emoticons = str_replace( '][', '', trim( join( array_unique( $emoticons ) ), '[]' ) );
$string = preg_replace( '/\s*\[(' . join( '|', $emoticons ) . ')\]\s*/', '[$1]', $string ); // striping white spaces around word-styled emoticons