Search code examples
regexgreptextwrangler

Grep Select ALL Images in code block


I've searched across multiple sources for both Grep and RegEx selectors to select all images in a massive collection of garbled code and text. The closest I've come is How to Use grep to find '../images/', which didn't work for me.

I need to select the first occurrence of all image names (or copy all image names to a separate file) in my source file, so that, for example:

/Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_ABanner.gif

would select only

someurl.com_images_ABanner.gif

Here's a sample of the text that I am attempting to search through:

[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/banners/ABanner.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_banners_ABanner.gif : Not Found
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/randy.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_randy.jpg : Not Found
[fg-joomla-to-wordpress] Can't copy http://www.differenturl.com/images-body0/logo2.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/www.differenturl.com_images-body0_logo2.gif : Not Found
[fg-joomla-to-wordpress] Can't copy /images/DiffImage.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DiffImage.jpg : A valid URL was not provided.
[fg-joomla-to-wordpress] Can't copy /images/DSCN0248.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DSCN0248.jpg : A valid URL was not provided.

I recognize the pattern of the first occurrence contains /images/ with some exceptions (for example /images-body0/imagename.jpg), while the target does not, which simplifies it, but I just can't get it.


Solution

  • How's this, with sed's extended (-E) regular expressions? I'm selecting for all images (jpg, gif, png) occurring before the : at the end of the line in your input.

    $ sed -nE 's,^.*/([^/]*(jpg|gif|png)) : .*$,\1,p' file
    someurl.com_images_banners_ABanner.gif
    someurl.com_images_randy.jpg
    www.differenturl.com_images-body0_logo2.gif
    images_DiffImage.jpg
    images_DSCN0248.jpg