Search code examples
htmlregexexpression-web

Regular Expression to find <img /> tag which doesn't have alt=".#"


I need a regular expression that will return an image tag which doesn't have an alt attribute which has anything between the quotes. For instance, I would like it to return an img tag which has alt="" or which has no alt, but not one which has alt="y".

The image tags might have line breaks in them, and there could be more than one image tag per line.

Currently, what I have is:

<img.@(~[\r\n]|[\r\n])*.@(~(alt=".#"))*.@(~[\r\n]|[\r\n])*.@/>

and I'm testing it on this:

<img alt="" />
<img src="xyz.jpg"
alt="y" />
<img xxxx ABC /> 
<img xxxxxx ABC />
<img src="xyz.jpg" alt="y" />

But my regex returns each image tag, including the 2nd and 5th ones which I don't want to have returned.

I'm working in Microsoft Expression Web.


Solution

  • Your best bet would be to use jQuery to parse the string to an html nodes then filter them from there using a selector.

    var str = '<img alt="" /><img src="xyz.jpg" alt="y" /><img xxxx ABC /> <img xxxxxx ABC /><img src="xyz.jpg" alt="y" />';
    var elementsWithoutAlt = $( str ).filter( 'not([alt])' );
    console.log(elementsWithoutAlt.length);
    

    'not([alt])' will find all the elements without an alt attribute. 'img:not([alt])' will find all the 'image' elements without an alt attribute.

    Demo: (Click render to see it in action) http://jsbin.com/imeyam/3/edit

    jQuery Info http://api.jquery.com/has-attribute-selector/