Search code examples
phpregextokenfile-get-contents

Get video URL with regex for current expire time and token


I'm trying to get a current video URL from a page. The video URL has expire time and token in this format: http://cdn.videourl.mp4?expire=1635939248&token=7022dbc14de970c7uc040ac4f35058f0

This is what I got so far:

$html = file_get_contents("http://www.videos.com/");

preg_match_all('/(http.*mp4\?[a-zA-Z]+=[0-9]+&[a-zA-Z]+=([0-9]+([a-zA-Z]+[0-9]+)+)',

    $html,
    $posts, // will contain the article data
    PREG_SET_ORDER // formats data into an array of posts
);

foreach ($posts as $post) {
    $link = $post[0];

echo $link;
}

With this regex /(http.*mp4(.*?))/ I can get the url until the .mp4

What's missing in my regex to get the full URL? I also tried with this one (but I think something is missing...): /(http.*mp4\?[a-zA-Z]+=[0-9]+&[a-zA-Z]+=([0-9]+([a-zA-Z]+[0-9]+)+)


Solution

  • In your php example, the regex contains an unclosed parenthesis at the start of the pattern. There are also delimiters missing for the pattern in preg_match_all

    Given that the order of the querystring parameters is like this, you can use a capture group for the part after the first equals sign, and a single capture group after the second equals sign:

    http\S*?\.mp4\?[a-zA-Z]+=([0-9]+)&[a-zA-Z]+=([0-9a-z]+)
    

    See a regex demo.

    For example

    $html = "http://cdn.videourl.mp4?expire=1635939248&token=7022dbc14de970c7uc040ac4f35058f0";
    
    preg_match_all('/http\S*?\.mp4\?[a-zA-Z]+=([0-9]+)&[a-zA-Z]+=([0-9a-z]+)/',
        $html,
        $posts, // will contain the article data
        PREG_SET_ORDER // formats data into an array of posts
    );
    
    var_export($posts);
    

    Output

    array (
      0 => 
      array (
        0 => 'http://cdn.videourl.mp4?expire=1635939248&token=7022dbc14de970c7uc040ac4f35058f0',
        1 => '1635939248',
        2 => '7022dbc14de970c7uc040ac4f35058f0',
      ),
    )
    

    If the order of the parameters is not fixed, you could also use named capture groups with the same name and the J flag.

    http\S*?\.mp4\?(?:expire=(?P<expire>[0-9]+)&token=(?P<token>[0-9a-z]+)|token=(?P<token>[0-9a-z]+)&expire=(?P<expire>[0-9]+))
    

    See a php demo.


    Note that it might be easier to get the key values pairs using parse_url.

    For example

    parse_str(parse_url($html, PHP_URL_QUERY), $result);
    var_dump($result);
    

    Output

    array(2) {
      ["expire"]=>
      string(10) "1635939248"
      ["token"]=>
      string(32) "7022dbc14de970c7uc040ac4f35058f0"
    }