I need to extract only parts of a URL with PHP but I am struggling to the set point where the extraction should stop. I used a regex to extract the entire URL from a longer string like this:
$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $href, $matches);
The result is the following string:
http://www.cambridgeenglish.org/test-your-english/&sa=U&ei=a4rbU8agB-zY0QWS_IGYDw&ved=0CFEQFjAL&usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg
Now I want to extract only this bit http://www.cambridgeenglish.org/test-your-english/. I basically need to get rid off everything starting at &
onwards.
Anyone an idea how to achieve this? Do I need to run another regex or can I add it to the initial one?
The below regex would get ridoff everything after the string &
. Your php code would be,
<?php
echo preg_replace('~&.*$~', '', 'http://www.cambridgeenglish.org/test-your-english/&sa=U&ei=a4rbU8agB-zY0QWS_IGYDw&ved=0CFEQFjAL&usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg');
?> //=> http://www.cambridgeenglish.org/test-your-english/
Explanation:
&
Matches the string &
..*
Matches any character zero or more times.$
End of the line.