Search code examples
regexurlweblanguage-agnosticimgur

How would I detect what is an Imgur picture link, and what isn't?


I'm trying to programmatically figure out whether or not an link is a link to an Imgur image or not. An example of an Imgur image link would be: https://i.sstatic.net/KygT3.jpg or https://i.sstatic.net/GYvxm.jpg (the first is an indirect link and the latter is direct, but the ID stays the same)

I want https://i.sstatic.net/KygT3.jpg to evaluate to true when asked if an Imgur link, but http://imgur.com/gallery to be false. I'm confused how to distinguish between those two when they're both imgur.com/*letters*.

I ask because I know Reddit Enhancement Suite has this functionality. If I post http://imgur.com/gallery it doesn't offer an image button to preview it, but it would for https://i.sstatic.net/KygT3.jpg

So how would I be able to identify this? Finding every word that doesn't qualify, like gallery, jobs, or about in imgur.com/*whatever* would seem really hacky, and would break upon any new page being added. And there's not always numbers in the second part so I can't rely on that to identify it.


Solution

  • Run this snippet for a JavaScript example

    $(function(){
      
        var url_re = /https?[^<"]+/g  /* pattern for url-like substrings */
        
        var txt = $(".post-text").html(); /* taking this question text as input */
      
    	while(m = url_re.exec(txt)){ /* match all url-like substrings in input */
          
            /* verify if it's a imgur URL */
          
    		var imgur_re = /^https?:\/\/(\w+\.)?imgur.com\/(\w*\d\w*)+(\.[a-zA-Z]{3})?$/
            
            
            /* Show result */
            
            $("#results").append("<li>" + m + ": " + imgur_re.test(m) + "</li>");
    	}
      
    });
    <ul id="results"></ul>
    
    <div class="post-text" itemprop="text">
    <p>I'm trying to programmatically figure out whether or not an link is a link to an Imgur image or not. An example of an Imgur image link would be: <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> or <a href="http://i.imgur.com/0AKSCQ4.jpg" rel="nofollow">http://i.imgur.com/0AKSCQ4.jpg</a> (the first is an indirect link and the latter is direct, but the ID stays the same)</p>
    
    <p>I want <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> to evaluate to <code>true</code> when asked if an Imgur link, but <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> to be <code>false</code>. I'm confused how to distinguish between those two when they're both <code>imgur.com/*letters*</code>.</p>
    
    <p>I ask because I know <a href="http://redditenhancementsuite.com" rel="nofollow">Reddit Enhancement Suite</a> has this functionality. If I post <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> it doesn't offer an image button to preview it, but it would for <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a></p>
    
    <p>So how would I be able to identify this? Finding every word that doesn't qualify, like <code>gallery</code>, <code>jobs</code>, or <code>about</code> in <code>imgur.com/*whatever*</code> would seem really hacky, and would break upon any new page being added. And there's not <em>always</em> numbers in the second part so I can't rely on that to identify it.</p>
    </div>
    
    
    <script type="text/javascript" src="//code.jquery.com/jquery-2.1.1.min.js"></script>