Search code examples
phpregexpreg-matchpreg-match-all

PHP: preg_match_all Youtube video IDs from text


I want extract from the text youtube url string like https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 and the video id like 0EB7zh_7UE4 so I can inject text behind the string based on video id. This is my sample text:

This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/1EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!

https://www.youtube.com/watch?v=0EB7zh_7UE4

more

https://www.youtube.com/watch?v=2EB7zh_7UE4&feature=youtu.be

That\'s all..

This is regex I got so far but errors are as follows:

  • it adds (here) string before end of link string (in the middle). I want to add (here) at the end you Youtube url link string

  • it returns multiple here injection

See code:

function regex($sample_text) {
    if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
        print_r($matches);
        foreach ($matches as $match) {
            $add = ' (here)';
            $processed_text = str_replace($match[0], $match[0] . $add, $sample_text);
        }
    }
    return $processed_text;
}
echo regex($sample_test);

Where I do mistake?

Note: question + sample text have been updated.


Solution

  • To expand on my comment, you're replacing the result text each time with the original string, $sample_text. This is a simple fix, just initialise $processed_text at the start, and work on that.

    function regex($sample_text) {
        $processed_text = $sample_text;
        if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
            print_r($matches);
            foreach ($matches as $match) {
                $add = ' (here)';
                $processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
            }
        }
        return $processed_text;
    }
    echo regex($sample_test);
    

    Your regex is also not matching to the end of the URL. For the purposes of the sample text you provided, you could match up to anything that isn't whitespace:

    '#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s'
    

    However this won't match characters like " or ., but you could add those in as an | in a group. You seem to have a pretty good grasp of regex, so I'll assume you can work this out - if not, comment and I'll update my answer.


    For completeness sake, I've included the completed code with my regex:

    function regex($sample_text) {
        $processed_text = $sample_text;
        if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', $sample_text, $matches, PREG_SET_ORDER)) {
            print_r($matches);
            foreach ($matches as $match) {
                $add = ' (here)';
                $processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
            }
        }
        return $processed_text;
    }
    echo regex($sample_test);