Search code examples
phpregexpreg-replacestr-replaceemoticons

PHP - Replacing emoticon with meaning


I am analysing informal chat style message for sentiment and other information. I need all of the emoticons to be replaced with their actual meaning, to make it easier for the system to parse the message.

At the moment I have the following code:

$str = "Am I :) or :( today?";

$emoticons = array(
    ':)'    =>  'happy',
    ':]'    =>  'happy',
    ':('    =>  'sad',
    ':['    =>  'sad',
);

$str = str_replace(array_keys($emoticons), array_values($emoticons), $str);

This does a direct string replacement, and therefore does not take into account if the emoticon is surrounded by other characters.

How can I use regex and preg_replace to determine if it is actually an emoticon and not part of a string?

Also how can I extend my array so that happy element for example can contain both entries; :) and :]?


Solution

  • For maintainability and readability, I would change your emoticons array to:

    $emoticons = array(
        'happy' => array( ':)', ':]'),
        'sad'   => array( ':(', ':[')
    );
    

    Then, you can form a look-up table just like you originally had, like this:

    $emoticon_lookup = array();
    foreach( $emoticons as $name => $values) {
        foreach( $values as $emoticon) {
            $emoticon_lookup[ $emoticon ] = $name;
        }
    }
    

    Now, you can dynamically form a regex from the emoticon lookup array. Note that this regex requires a non-word-boundary surrounding the emoticon, change it to what you need.

    $escaped_emoticons = array_map( 'preg_quote', array_keys( $emoticon_lookup), array_fill( 0, count( $emoticon_lookup), '/'));
    $regex = '/\B(' . implode( '|', $escaped_emoticons) . ')\B/';
    

    And then use preg_replace_callback() with a custom callback to implement the replacement:

    $str = preg_replace_callback( $regex, function( $match) use( $emoticon_lookup) {
        return $emoticon_lookup[ $match[1] ];
    }, $str);
    

    You can see from this demo that this outputs:

    Am I happy or sad today?