Extract specific part of URL from string

I need to extract only parts of a URL with PHP but I am struggling to the set point where the extraction should stop. I used a regex to extract the entire URL from a longer string like this:

$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $href, $matches);

The result is the following string:

http://www.cambridgeenglish.org/test-your-english/&amp;sa=U&amp;ei=a4rbU8agB-zY0QWS_IGYDw&amp;ved=0CFEQFjAL&amp;usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg

Now I want to extract only this bit http://www.cambridgeenglish.org/test-your-english/. I basically need to get rid off everything starting at &amp onwards.

Anyone an idea how to achieve this? Do I need to run another regex or can I add it to the initial one?

Solution

The below regex would get ridoff everything after the string &amp. Your php code would be,

<?php
echo preg_replace('~&amp.*$~', '', 'http://www.cambridgeenglish.org/test-your-english/&amp;sa=U&amp;ei=a4rbU8agB-zY0QWS_IGYDw&amp;ved=0CFEQFjAL&amp;usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg');
?> //=> http://www.cambridgeenglish.org/test-your-english/

Explanation:

&amp Matches the string &amp.
.* Matches any character zero or more times.
$ End of the line.