How to filter URLs that contain white space with preg match?

I parse through a text that contains several links. Some of them contain white spaces but have a file ending. My current pattern is:

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $links, $match);

This works the same way:

preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $links, $match);

I don't know much about the patterns and didn't find a good tutorial that explains the meaning of all possible patterns and shows examples.

How could I filter an URL like this: http://my-url.com/my doc.doc or even http://my-url.com/my doc with more white spaces.doc

The \s in that preg_match_all functions stands for a white space. But how could I check if there is a file ending behind one or some white spaces?

Is it possible?

Solution

Alright after doing this really helpful tutorial I finally know how the regex syntax works. After finishing it I experimented a bit on this site

It was pretty easy after figuring out that all hyperlinks in my parsed document were in between quotation marks so I just had to change the regex to:

preg_match_all('#\bhttps?://[^()<>"]+#', $links, $match);

so that after the " it is looking for the next match that begins with http.

But that's not the full solution yet. The user Class was right - without rawurlencode the filenames it won't work.

So the next step was this:

function endsWith($haystack, $needle)
{
    return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}

if(endsWith($textlink, ".doc") || endsWith($textlink, ".docx") || endsWith($textlink, ".pdf") || endsWith($textlink, ".jpg") || endsWith($textlink, ".jpeg") || endsWith($textlink, ".png")){
        $file = substr( $textlink, strrpos( $textlink, '/' )+1 );
        $rest_url=substr($textlink, 0, strrpos($textlink, '/' )+1 );
        $textlink=$rest_url.rawurlencode($file);            
    }

That filters the filenames from the URLs and rawurlencodes them so that the the output links are correct.