Search code examples
phpregexpreg-matchtext-extractionmention

Detecting @mentions in a string returns two elements from one match


I have the following code:

$string = "Manual balls knit cardigan @120rb

ORDER
BB 28AFF6A6 atau 25AE5DB3 
Phone 081298249949 atau 081310570229 
Line indy2212 atau indy2281 
FORMAT
Nama 
Alamat 
Telp 
Kode barang";

if (preg_match('/(?<= )@([^@ ]+)/', $string, $matches)) {
    var_dump(count($matches));
    var_dump('first ' . $matches[0]);
    var_dump('second ' . $matches[1]);
}

However this results in $matches to return an array of count 2. With the following string:

2
@120rb ORDER BB
120rb ORDER BB

My question is why? Why does it match the string twice? What is wrong with my regex


Solution

  • preg_match() stores the matches into an array which you supply as the third parameter. In this case your preg_match() statement looks like:

    preg_match('/(?<= )@([^@ ]+)/', $string, $matches);
    

    So $matches contain all the matches, where:

    • $matches[0] will contain the text that matched the full pattern
    • $matches[1] will have the text matched by the first capturing group
    • $matches[2] will have the text matched by the second capturing group
    • and so on...

    The regular expression here is (?<= )@([^@ ]+). It matches @120rb ORDER BB completely, so it will be stored in $matches[0], whereas the capturing group ([^@ ]+) will only capture the part after the @ (120rb ORDER BB) and it will be stored in $matches[1].

    Currently, the regular expression doesn't detect if a mention is at the beginning of the string. Also, it'd incorrectly match whitespace on the next line as [^@] will match anything that's not a @ symbol. I'd use the following expression with preg_match_all():

    (?<=^|\s)@([^@\s]+)
    

    Code:

    if (preg_match_all('/(?<=^|\s)@([^@\s]+)/', $string, $matches)) {
        print_r($matches[1]);
    }
    

    To get the number of matches, you can just use echo count($matches[0]);.

    Demo