php regex preg-replace str-replace word-boundary

PHP Array str_replace Whole Word

I'm doing str_replace on a very long string and my $search is an array.

$search = array(
    " tag_name_item ",
    " tag_name_item_category "
);

$replace = array(
    " tag_name_item{$suffix} ",
    " tag_name_item_category{$suffix} "
);

echo str_replace($search, $replace, $my_really_long_string);

The reason why I added spaces on both $search and $replace is because I want to only match whole words. As you would have guessed from my code above, if I removed the spaces and my really long string is:

...
tag_name_item ...
tag_name_item_category ...
...

Then I would get something like

...
tag_name_item_sfx ...
tag_name_item_sfx_category ...
...

This is wrong because I want the following result:

...
tag_name_item_sfx ...
tag_name_item_category_sfx ...
...

So what's wrong?

Nothing really, it works. But I don't like it. Looks dirty, not well coded, inefficient.

I realized I can do something like this using regular expressions using the \b modifier but I'm not good with regex and so I don't know how to preg_replace.

Solution

A possible approach using regular expressions would/could look like this:

$result = preg_replace(
    '/\b(tag_name_item(_category)?)\b/',
    '$1' . $suffix,
    $string
);

How it works:

\b: As you say are word boundaries, this is to ensure we're only matching words, not word parts
(: We want to use part of our match in the replacement string (tag_name_index has to be replaced with itself + a suffix). That's why we use a match group, so we can refer back to the match in the replacement string
tag_name_index is a literal match for that string.
(_category)?: Another literal match, grouped and made optional through use of the ? operator. This ensures that we're matching both tag_name_item and tag_name_item_category
): end of the first group (the optional _category match is the second group). This group, essentially, holds the entire match we're going to replace
\b: word boundary again

These matches are replaced with '$1' . $suffix. The $1 is a reference to the first match group (everything inside the outer brackets in the expression). You could refer to the second group using $2, but we're not interested in that group right now.

That's all there is to it really

More generic:

So, you're trying to suffix all strings starting with tag_name, which judging by your example, can be followed by any number of snake_cased words. A more generic regex for that would look something like this:

$result = preg_replace(
    '/\b(tag_name[a-z_]*)\b/',
    '$1' . $suffix,
    $string
);

Like before, the use of \b, () and the tag_name literal remains the same. what changed is this:

[a-z_]*: This is a character class. It matches characters a-z (a to z), and underscores zero or more times (*). It matches _item and _item_category, just as it would match _foo_bar_zar_fefe.

These regex's are case-sensitive, if you want to match things like tag_name_XYZ, you'll probably want to use the i flag (case-insensitive): /\b(tag_name[a-z_]*)\b/i

Like before, the entire match is grouped, and used in the replacement string, to which we add $suffix, whatever that might be