Using grep/regex, I am trying to pull img tags out of a file. I only want img tags that contain 'photobucket' in the source, and I do not want img tags that do not contain photobucket.
Want:
<img src="/photobucket/img21.png">
Do Not Want:
<img src="/imgs/test.jpg">
<img src="/imgs/thiswillgetpulledtoo.jpg"><p>We like photobucket</p>
What I have tried:
(<img.*?photobucket.*?>)
This did not work, because it pulled the second example in "Do Not Want", as there was a 'photobucket' and then a closing bracket. How can I only check for 'photobucket' up until the first closing bracket, and if photobucket is not contained, ignore it and move on?
'photobucket' may be in different locations within the string.
grep -o '<img[^>]*src="[^"]*photobucket[^>]*>' infile
-o
returns only the matches. Split up:
<img # Start with <img
[^>]* # Zero or more of "not >"
src=" # start of src attribute
[^"]* # Zero or more or "not quotes"
photobucket # Match photobucket
[^>]* # Zero or more of "not >"
> # Closing angle bracket
For the input file
<img src="/imgs/test.jpg">
<img src="/imgs/thiswillgetpulledtoo.jpg"><p>We like photobucket</p>
<img src="/photobucket/img21.png">
<img alt="photobucket" src="/something/img21.png">
<img alt="something" src="/photobucket/img21.png">
<img src="/photobucket/img21.png" alt="something">
<img src="/something/img21.png" alt="photobucket">
this returns
$ grep -o '<img[^>]*src="[^"]*photobucket[^>]*>' infile
<img src="/photobucket/img21.png">
<img alt="something" src="/photobucket/img21.png">
<img src="/photobucket/img21.png" alt="something">
The non-greedy .*?
works only with the -P
option (Perl regexes).