Search code examples
phpregexstringpreg-matchreserved

Why does preg_match not find some literal words


Using PHP, I am trying to test for the presence of various words and patterns in a string but am not able to figure out why I am seeing odd behaviour when attempting to match certain words.

Example 1: Why does the following not return 1?

$test = 'clen=a.le​ngth;for(i=0;i<clen;i++)b+=St​ring.fr​omCh​arCode(a.char​CodeAt(i)^2)';

$result = preg_match('/(string)/i', $test, $matches);

$result is always zero for the above even though the word "String" is present in the subject string.

Example 2: However, let's say I slightly change my regex to the following:

$test = 'clen=a.le​ngth;for(i=0;i<clen;i++)b+=St​ring.fr​omCh​arCode(a.char​CodeAt(i)^2)';
$result = preg_match('/st.+(ring)/i', $test, $matches);

The above returns the value of 1 for $result. Seems like when I split up the word "string" into separate parts, I can get a match.

Example 3: Once again when I slightly modify the regex in this example, it also returns zero but I'm not sure why:

$test = 'clen=a.le​ngth;for(i=0;i<clen;i++)b+=St​ring.fr​omCh​arCode(a.char​CodeAt(i)^2)';
$result = preg_match('/(tring)/i', $test, $matches);

Trying to match on the sequence of characters such as "tring" returns 0 but when matching on "ring" it returns 1. But "tring" doesn't sound like any type of special or reserved word!

This behaviour is also the same for various other words such as "document" and "unescape" and I'm sure there are many others.

I am assuming that some words are probably being treated differently by the regex engine because they might be reserved or special in some way but I have not been able to find an official explanation for the above behaviour.

I apologise if I am missing something really obvious and would really appreciate it if someone can please explain this to me. Many thanks.


Solution

  • i think your first regex is fine. Look here https://regex101.com/r/tO9vN8/1

    But there seems to be a Problem with the charset, i had to rewrite the expression - if i copy from this site, the regex did not match.

    I hope this will be the right direction ...