Thank you for looking at this!
I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:
Raw URL:
/folder/index.html?knownParamA=1234&unknownParamA=1234&knownParamB=1234&unknownParamB=1234
My List of Know Parameters:
((knownParamA|knownParamB|knownParamC)=[^&]*&?)/gi
Resulting (Cleaned up) URL:
/folder/index.html?knownParamA=1234&unknownParam=1234
Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.
Thank you so much for the help!!!
Solution Based on Feedback Below:
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');
Negative searches are tricky, and require zero-width lookaheads.
This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)
step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a &
instead of a ?
, and you will need to replace that too:
clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
You can chain these together, of course:
clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
replace(/[?&]([^=]+=[^&]*)/, '?$1');
Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');
To help you interpret these regexes:
[?&]
= either ?
or &
(
...)
= captured group(?!
...)
= not followed by a match for this group(?:
...)
= uncaptured group(?=
...)
= followed by a match for this group=
= =
[^=]
= any character other than =
+
= one or more times[^&]
= any character other than &
*
= zero or more timesOutside the regex body,
g
flag means 'all matches' (as opposed to only the first)i
flag means 'case-insensitive'$1
means 'captured group 1'