Search code examples
phpregexgoogle-analyticsgoogle-tag-manager

Adding functionality to regular expression code for finding analytics tracking id


I have a regular expression function that will crawl a webpage and tell me if a Google Analytics code is on the site:

 function checkUA($domain) {
    $input = file_get_contents($domain);
    if ( $input !== false ){
        //regex to check for UA or GA4 code that returns all matches
        $regex = '/\b[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:\-[1-9]\d{0,3})?\b/';
        //if there is a match, return a unique array (tracking may be included more than once)
        if(preg_match_all($regex, $input,$matches)){
            return array_unique($matches[0]);
        }else{
            //if no match is found, let us know
            return 'no match found';    
        }
    }else{
        return 'Site is blocked from crawlers';
    }
 }

This function will find any tracking IDs that start with UA YT MO G DC AW:

UA-12345-1 G-J2DV45G DC-JGWWE32 AW-GER322

I'm trying to add functionality to the regular expression to find if there are Google Tag manager IDs as well:

 $regex = '/\b[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:\-[1-9]\d{0,3})?\b/';

So an ID that looks like this: GTM-5TDMDSZ

I can't seem to figure out how to add a check into my regular expression above that will also include checking for GTM IDs like the one above.


Solution

  • As fourthbird mentioned, using a pipe to mean "or" will do the job.

    I recommend tightening your existing pattern so that it only honors the specific tracker ids that you're intending to target.

    To make your code easier to read, use the x pattern modifier so that typed spaces are ignored by the regex engine. You can comment inside of your regex by using # to separate pattern from comment.

    Code: (Demo)

    $string = 'UA-12345-1 G-J2DV45G NOPE DC-JGWWE32 AW-GER322 NAH-MATE GTM-5TDMDSZ G-WIZ';
    
    $trackingPrefixes = ['UA', 'YT', 'MO', 'G', 'DC', 'AW'];
    
    preg_match_all(
        '/\b
            (?:
               (?:' . implode('|', $trackingPrefixes) . ')-[A-Z\d]{4,10}(?:-[1-9]\d{0,3})?   #Tracker Ids
               |
               GTM-[A-Z\d]+                                                                  #Google Tag Manager Ids
            )
        \b/x',
        $string,
        $m
    );
    var_export($m[0]);