Search code examples
phppreg-replacepreg-replace-callback

Parse sentence with php preg_replace_callback and ignore data inside parenthesis


I have this string,

A large hoofed mammal (Equus caballus) having a short-haired coat, a long mane, and a long tail, domesticated since ancient times and used for riding and for drawing or carrying loads.

which needs to be converted into this:

A large hoofed mammal (Equus caballus) having a short-haired coat, a long mane, and a long tail, domesticated since ancient times and used for riding and for drawing or carrying loads.

These are the requirements:

  1. words with a length of 5+ have to be encapsulated inside an a href tag. (this is already solved)
  2. words inside parenthesis have to be ignored. This is the missing requirement in the regex

Currently, the code below is converting the original string into this (data inside parenthesis is not being ignored as it should be):

A large hoofed mammal (Equus caballus) having a short-haired coat, a long mane, and a long tail, domesticated since ancient times and used for riding and for drawing or carrying loads.

This is my current code:

$result = preg_replace_callback('/\b[\p{L}\p{M}]{5,}\b/u', create_function(
    '$matches',
    'return "<a href=\"http://words.com/".strtolower($matches[0])."\">$matches[0]</a>";'
), $data);

How can I implement the 2nd requirement in the same regex? Thanks!


Solution

  • You can use a capture group:

    $result = preg_replace_callback('~(\([^)]+\))|[\pL\pM]{5,}~u', function ($m) {
        if (empty($m[1]))
            return '<a href="http://words.com/' . strtolower($m[0]) . '">' . $m[0] . '</a>';
        return $m[1];
    }, $data);
    

    or you can use backtracking control verbs (*SKIP)(*FAIL):

    $result = preg_replace_callback('~\([^)]+\)(*SKIP)(*FAIL)|[\pL\pM]{5,}~u', function ($m) {
        return '<a href="http://words.com/' . strtolower($m[0]) . '">' . $m[0] . '</a>';
    }, $data);