Search code examples
phpregexsymbolspos-tagger

regex match specific symbol ' - " ( ) * , . : … ; ? `


I want to build a regex that will tag these specific symbol as "SYM". Therefore, I am building a regex in php which will match this specific symbol ONLY.Is there any regex which accepts these symbol?

'   -   "   (   )  *  ,  .   :  …  ;  ?  `

The output should be like this: ' \SYM - \SYM " \SYM ( \SYM ) \SYM & so on...

This is my programme but it doesn't work :

<?php 
 $str = "'this' is Mary! (a dog - not a human)";
 $split = explode(" ",$str);
      foreach($split as $value) {
         $match = array();
         $count = preg_match_all("/\!/|\'/|\-/",$value,$match);
           if ($count != 0)
              $text = "\SYM";
          else
              $text = "\not SYM";
   echo "<br>".$count." ".$value." ".$text;
}
?>

Solution

  • your code may be as simple as

    <?php
    $in = "'this' is Mary! (a dog - not a human)";
    $out = preg_replace('/([-\'"()*,.:…;?`])/', '\1\\SYM ', $in);
    echo $out;
    

    the regex /([-'"()*,.:…;?])/ matches each of your special chars and captures it for later use, but take care: the dash (-) should be the first in the character class to avoid creating ranges, the single quote needs to be escaped (for PHP). The replacement simply replaces the capture (first capturing parenthesis from the left, so \1) with itself and appends the string \SYM and a whitespace. if you need more whitespaces around your replacement, you can alter the replacement string to something like ' \1\\SYM ' or '\1 \\SYM ' or even ' \1 \\SYM '

    A more "sophisticated" (or elegant, or nerdy) method using lookarounds looks pretty much the same:

    $out = preg_replace('/(?<=[-\'"()*,.:…;?`])/', '\SYM ', $in);
    

    the major difference is, that it does not capture the special char but matches right BEHIND one. Note that only a position is matched here, this position (imagine it as an empty string) gets replaced by your mark - effectively just inserting your mark

    Both approaches deliver the same output:

    '\SYM this'\SYM  is Mary! (\SYM a dog -\SYM  not a human)\SYM