I have a regular expression that captures three backreferences though one (the 2nd) may be null
.
Given the flowing string:
http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajonathonoat.es&source=web&cd=1&ved=0CC8QFjAA&url=http%3A%2F%2Fjonathonoat.es%2Fbritish-mozcast%2F&ei=MQj9UKejDYeS0QWruIHgDA&usg=AFQjCNHy1cDoWlIAwyj76wjiM6f2Rpd74w&bvm=bv.41248874,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1
I wish to capture the TLD (in this case .co.uk), q
param and cd
param.
I'm using the following RegEx:
/.*\.google([a-z\.]*).*q=(.*[^&])?.*cd=(\d*).*/i
Which works except the 2nd backreference includes the other parameters upto the cd
param, I current get this:
["http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1 ", ".co.uk", "site%3Ajonathonoat.es&source=web", "1", index: 0, input: "http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1"]
The 1st backreference is correct, it's .co.uk
and so is the 3rd; it's 1
. I want the 2nd backreference to be either null (or undefined or whatever) or just the q
param, in this example site%3Ajonathonoat.es
. It currently includes the source
param too (site%3Ajonathonoat.es&source=web
).
Any help would be much appreciated, thanks!
I've added a JSFiddle of the code, look in your browser console for the output, thanks!
You want the middle group to be:
q=([^&]*)
This will capture characters other than ampersand. This also allows zero characters, so you can remove the optional group (?
).
Working example: http://rubular.com/r/AJkXxgeX5K