Search code examples
bashshellfindbatch-processingwebp

Find jpg images without an extra webp suffix


I have a directory with a few million images randomly placed inside other subdirectories. I want to generate webp images for all jpg images by appending the webp extension, leaving alone other formats such as gif images.

I can run the command below on ubuntu 18 to generate all the webp images I want, consisting of the file name with .webp suffix appended to it:

find /home/photos -type f \( -iname \*.jpg -o -iname \*.jpeg \) | parallel -eta cwebp {} -o {}.webp

However, in time I'll add other jpg images to some other subdirectories and I want to run the same command again, only for the new jpg images that have no .webp suffix equivalent.

If I have:

-- 1.png
-- 1.gif
-- 2.jpg
-- 2.jpg.webp
-- 3.jpg
-- subdir/4.jpg
-- subdir/5.jpg
-- subdir/5.jpg.webp

How do I find, 3.jpg and subdir/4.jpg only? (the ones without a webp version)

Furthermore, searching by time is not possible because the new photos may have an older modification time than the last run.


Solution

  • You could test for existence of the output file within parallel and only create it if it doesn't exist like this:

    find . -iname \*.jpg | parallel -eta 'out={}.webp; [ ! -f "$out" ] && cwebp {} -o "$out"'
    

    Or, exactly the same, but trying harder to be less negative in my outlook:

    find . -iname \*.jpg | parallel -eta 'out={}.webp; [ -f "$out" ] || cwebp {} -o "$out"'
    

    :-)