Search code examples
phpregexurlyoutubepreg-match

Why downloading the youtube file id does not work?


I found the code that gets the movie id from the youtube site, the script works well natomast if the ID has a hyphen "-" the script does not get the ID from the url. I'm weak in regular expressions but still tried to convert the expression, but I can not deal with it. Can you direct me or show me what error I'm doing? Thanks

My code:

$links = array(
'https://www.youtube.com/watch?v=-SXKV0jDxuA',
'https://www.youtube.com/watch?v=ylfhCpi9AEU'
);
foreach ($links as $link){
    preg_match("#([\/|\?|&]vi?[\/|=]|youtu\.be\/|embed\/)(\w+)#", $link, $matches);
    var_dump(end($matches));
}   //result => ylfhCpi9AEU

Solution

  • As for how I would improve your comment pattern:

    ~(?:[/?&]vi?[/=]|youtu\.be/|embed/)\K[\w-]{10,12}~
    
    • This uses a different pattern delimiter -- a character that is not used in the pattern itself. This avoids having to escape characters in the pattern unnecessarily.
    • Using pipes (|) inside of character classes is not how character classes work. Characters classes ([..]) are a list of characters or character ranges that are targeted. By writing | inside the character class, you are including | as a valid character which is not intended.
    • \w is the equivalent of [A-Za-z0-9_], so your pattern is made more brief if use it where appropriate.
    • \K starts the fullstring match so that you don't need to use any capturing groups to extract the ID (this improves performance and reduces the output array bloat).
    • I am using a ranged quantifier on the ID substring (as other StackOveflow users have done) to allow the expansion of the valid ID length. If my pattern become obsolete because of IDs that have a length greater than 12, just adjust the upper limit.

    As for how I would write the most inclusive pattern I can dream up (given all of the possible url variations that I found laying around StackOverflow):

    *Note, this doesn't check the front of your url. It assumes that you are only dealing with valid youtube urls.

    ~(?:[/?&](?:e|vi?|ci)(?:[/=]|%3D)|youtu\.be/|embed/|/user/[^/]+#p/(?:[^/]+/)+)\K[\w-]{10,12}~
    

    This is a pattern demo that includes a long-ish list of youtube urls that I found. (I won't spell out the all of the components of this pattern, because it may be overkill for the urls that you are dealing with. If you or anyone else would like me to break it down, just ask.)

    *To be perfectly clear, my patterns are not designed to VALIDATE youtube urls, but rather they are designed to EXTRACT IDs from valid youtube urls.

    When used with preg_match() you will only need to access the first element (fullstring match [0]) value.