Search code examples
javajavascriptregexyoutubegdata

Change JS regex to work in Java


I came across this JS regex that retrieve ID from the Youtube URLs listed below.

/(youtu(?:\.be|be\.com)\/(?:.*v(?:\/|=)|(?:.*\/)?)([\w'-]+))/i

Youtube URLS tested on:

http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo

http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel

http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub

http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I

http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/6dwqZw0j_jY

http://youtu.be/6dwqZw0j_jY

http://www.youtube.com/watch?v=6dwqZw0j_jY&feature=youtu.be

http://youtu.be/afa-5HQHiAs

http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo?rel=0

http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel

http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub

http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I

http://www.youtube.com/embed/nas1rJpm7wY?rel=0

http://www.youtube.com/watch?v=peFZbP64dsU

How do I modify the regex to work in Java? Also, can it be altered to pick IDs from gdata URLs too? e.g https://gdata.youtube.com/feeds/api/users/Test/?alt=json&v=2

Update: This is the function where I intend to use the Regex.

public static String getIDFromYoutubeURL(String ytURL ) {
    if(ytURL.startsWith("https://gdata")) {  // This is my obviously silly hack,      
       ytURL = ytURL.replace("v=\\d", ""); // I belive Regext should handle this.
    }
    String pattern = "(?i)(https://gdata\\.)?(youtu(?:\\.be|be\\.com)/(?:.*v(?:/|=)|(?:.*/)?)([\\w'-]+))";
    Pattern compiledPattern = Pattern.compile(pattern);
    Matcher matcher = compiledPattern.matcher(ytURL);

    if(matcher.find()){
        return matcher.group(3);
    }
    return null;
}

Currently, it works fine for the URLs listed above and for https://gdata.youtube.com/feeds/api/users/Test/?id=c. However, It doesn't not work well if the Gdata URL have the version parameter. e.g v=2, (https://gdata.youtube.com/feeds/api/users/Test/?id=c&v=2). In this case, it returns 2 as the ID. How can it be improved to return Test and not 2 as the ID in the Gdata URL? Thanks.


Solution

  • I fixed it!
    Use replaceAll instead:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Test2 {
        public Test2() {
            // TODO Auto-generated constructor stub
        }
    
        public static void main(String[] args) {
            String toTest = getIDFromYoutubeURL(
                    "https://gdata.youtube.com/feeds/api/users/Test/?id=c&v=2");
            System.out.println(toTest);
        }
    
        public static String getIDFromYoutubeURL(String ytURL ) {
            if(ytURL.startsWith("https://gdata")) {  // This is my obviously silly hack,      
               ytURL = ytURL.replaceAll("v=\\d", ""); // I belive Regext should handle this.
            }
            String pattern = "(?i)(https://gdata\\.)?(youtu(?:\\.be|be\\.com)/(?:.*v(?:/|=)|(?:.*/)?)([\\w'-]+))";
            Pattern compiledPattern = Pattern.compile(pattern);
            Matcher matcher = compiledPattern.matcher(ytURL);
    
            if(matcher.find()){
                return matcher.group(3);
            }
            return null;
        }
    }