Search code examples
javascriptregexgoogle-analyticsurl-parametersuniversal-analytics

Regex Help - Match any URL Parameter & Value not in List


Thank you for looking at this!

I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:

Raw URL:

/folder/index.html?knownParamA=1234&unknownParamA=1234&knownParamB=1234&unknownParamB=1234

My List of Know Parameters:

((knownParamA|knownParamB|knownParamC)=[^&]*&?)/gi

Resulting (Cleaned up) URL:

/folder/index.html?knownParamA=1234&unknownParam=1234

Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.

Thank you so much for the help!!!

Solution Based on Feedback Below:

pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';

var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');

Solution

  • Negative searches are tricky, and require zero-width lookaheads.

    This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)

    step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
    // "/folder/index.html?knownParamA=1234&knownParamB=1234"
    

    However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:

    clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
    // "/folder/index.html?knownParamA=1234&knownParamB=1234"
    

    You can chain these together, of course:

    clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
      replace(/[?&]([^=]+=[^&]*)/, '?$1');
    

    Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.

    pageURL = window.location.pathname + window.location.search;
    knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
    var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
    var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
    cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');
    

    To help you interpret these regexes:

    • [?&] = either ? or &
    • (...) = captured group
    • (?!...) = not followed by a match for this group
    • (?:...) = uncaptured group
    • (?=...) = followed by a match for this group
    • = = =
    • [^=] = any character other than =
    • + = one or more times
    • [^&] = any character other than &
    • * = zero or more times

    Outside the regex body,

    • The g flag means 'all matches' (as opposed to only the first)
    • The i flag means 'case-insensitive'
    • In the replacement string, $1 means 'captured group 1'