Search code examples
regexstringregular-language

How to remove a line based on a regular expression with exceptions


I need to remove lines that have the following characteristics.

<img src="index-1_2.jpg"/><br>
<img src="index-1_3.jpg"/><br>
<img src="index-1_5.jpg"/><br>
<img src="index-2_1.jpg"/><br>
<img src="index-2_5.jpg"/><br>
<img src="index-3_1.png"/><br>
<img src="index-23_8.png"/><br>
<img src="index-22_9.png"/><br>
<img src="index-22_1.jpg"/><br>
<img src="index-22_2.jpg"/><br>
<img src="index-99_5.png"/><br>
<img src="index-100_5.png"/><br>
<img src="index-1000_5.png"/><br>
...

As you can see, the number that is found after the word index and after the _ and also, the image format (png, jpg) varies.

I need to generate a regex that removes all these lines EXECPTING numbers found after the index. For example, I need to keep the lines that have only the numbers 1 and 2.

I have the following generated regular expression

^<img src="index-(?!2|1)\d+_\d+\.(?:jpg|png)"\/><br>$

but wanting to keep the numbers 1 and 2, it also keeps the numbers 22, 23, 100 and 1000 since they contain those numbers


Solution

  • Use

    ^<img src="index-(?![12]_)(\d+)_\d+\.(?:jpg|png)"\/><br>$
    

    See regex proof. Use $1 as replacement.

    EXPLANATION

    --------------------------------------------------------------------------------
      ^                        the beginning of the string
    --------------------------------------------------------------------------------
      <img src="index-         '<img src="index-'
    --------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
    --------------------------------------------------------------------------------
        [12]                     any character of: '1', '2'
    --------------------------------------------------------------------------------
        _                        '_'
    --------------------------------------------------------------------------------
      )                        end of look-ahead
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        \d+                      digits (0-9) (1 or more times (matching
                                 the most amount possible))
    --------------------------------------------------------------------------------
      )                        end of \1
    --------------------------------------------------------------------------------
      _                        '_'
    --------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times (matching
                               the most amount possible))
    --------------------------------------------------------------------------------
      \.                       '.'
    --------------------------------------------------------------------------------
      (?:                      group, but do not capture:
    --------------------------------------------------------------------------------
        jpg                      'jpg'
    --------------------------------------------------------------------------------
       |                        OR
    --------------------------------------------------------------------------------
        png                      'png'
    --------------------------------------------------------------------------------
      )                        end of grouping
    --------------------------------------------------------------------------------
      "                        '"'
    --------------------------------------------------------------------------------
      \/                       '/'
    --------------------------------------------------------------------------------
      ><br>                    '><br>'
    --------------------------------------------------------------------------------
      $                        before an optional \n, and the end of the
                               string