Search code examples
phpregexpreg-replacestr-replaceword-boundary

PHP Array str_replace Whole Word


I'm doing str_replace on a very long string and my $search is an array.

$search = array(
    " tag_name_item ",
    " tag_name_item_category "
);

$replace = array(
    " tag_name_item{$suffix} ",
    " tag_name_item_category{$suffix} "
);

echo str_replace($search, $replace, $my_really_long_string);

The reason why I added spaces on both $search and $replace is because I want to only match whole words. As you would have guessed from my code above, if I removed the spaces and my really long string is:

...
tag_name_item ...
tag_name_item_category ...
...

Then I would get something like

...
tag_name_item_sfx ...
tag_name_item_sfx_category ...
...

This is wrong because I want the following result:

...
tag_name_item_sfx ...
tag_name_item_category_sfx ...
...

So what's wrong?

Nothing really, it works. But I don't like it. Looks dirty, not well coded, inefficient.

I realized I can do something like this using regular expressions using the \b modifier but I'm not good with regex and so I don't know how to preg_replace.


Solution

  • A possible approach using regular expressions would/could look like this:

    $result = preg_replace(
        '/\b(tag_name_item(_category)?)\b/',
        '$1' . $suffix,
        $string
    );
    

    How it works:

    • \b: As you say are word boundaries, this is to ensure we're only matching words, not word parts
    • (: We want to use part of our match in the replacement string (tag_name_index has to be replaced with itself + a suffix). That's why we use a match group, so we can refer back to the match in the replacement string
    • tag_name_index is a literal match for that string.
    • (_category)?: Another literal match, grouped and made optional through use of the ? operator. This ensures that we're matching both tag_name_item and tag_name_item_category
    • ): end of the first group (the optional _category match is the second group). This group, essentially, holds the entire match we're going to replace
    • \b: word boundary again

    These matches are replaced with '$1' . $suffix. The $1 is a reference to the first match group (everything inside the outer brackets in the expression). You could refer to the second group using $2, but we're not interested in that group right now.

    That's all there is to it really


    More generic:

    So, you're trying to suffix all strings starting with tag_name, which judging by your example, can be followed by any number of snake_cased words. A more generic regex for that would look something like this:

    $result = preg_replace(
        '/\b(tag_name[a-z_]*)\b/',
        '$1' . $suffix,
        $string
    );
    

    Like before, the use of \b, () and the tag_name literal remains the same. what changed is this:

    • [a-z_]*: This is a character class. It matches characters a-z (a to z), and underscores zero or more times (*). It matches _item and _item_category, just as it would match _foo_bar_zar_fefe.

    These regex's are case-sensitive, if you want to match things like tag_name_XYZ, you'll probably want to use the i flag (case-insensitive): /\b(tag_name[a-z_]*)\b/i

    Like before, the entire match is grouped, and used in the replacement string, to which we add $suffix, whatever that might be