I'm trying to recombine multiple images into a single image using a command line tool like ImageMagick.
I have 30000 folders and In each folder there are nearly 50 images. The images are smaller tiles of a larger image that has been broken up into tiles. Each images is prefixed with their xy position
e.g. folder1/01-imagename01-imagename folder1/02-imagename folder1/03-imagename folder1/10-imagename and so on
example here.
00-zzi.....=x0-y0-z2.jpg
01-zzi.....=x0-y1-z2.jpg
02-zzi.....=x0-y2-z2.jpg
03-zzi.....=x0-y3-z2.jpg
each tile image is 512x512 and typically is less than 50kb
I'm trying to figure out if there's any way that the image magick composite capability is the right tool, or any other suggestions.
Thanks!
Debian GNU/Linux 11
identify image output:
Image:
Filename: 00-zzi3ZROdz3Lq8hTj7hy2ghoChBAv2D2-9bU_jPT-D4b_jXTraQfK81DEuQ=x0-y0-z2.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Class: DirectClass
Geometry: 512x512+0+0
Units: Undefined
Colorspace: sRGB
Type: Grayscale
Base type: Undefined
Endianness: Undefined
Depth: 8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
Channel statistics:
Pixels: 262144
Red:
min: 0 (0)
max: 19 (0.0745098)
mean: 3.06709 (0.0120278)
standard deviation: 0.61571 (0.00241455)
kurtosis: 28.6716
skewness: 2.14642
entropy: 0.251511
Green:
min: 0 (0)
max: 19 (0.0745098)
mean: 3.06709 (0.0120278)
standard deviation: 0.61571 (0.00241455)
kurtosis: 28.6716
skewness: 2.14642
entropy: 0.251511
Blue:
min: 0 (0)
max: 19 (0.0745098)
mean: 3.06709 (0.0120278)
standard deviation: 0.61571 (0.00241455)
kurtosis: 28.6716
skewness: 2.14642
entropy: 0.251511
Image statistics:
Overall:
min: 0 (0)
max: 19 (0.0745098)
mean: 3.06709 (0.0120278)
standard deviation: 0.61571 (0.00241455)
kurtosis: 28.6717
skewness: 2.14643
entropy: 0.251511
Colors: 18
Histogram:
560: (0,0,0) #000000 black
2814: (1,1,1) #010101 srgb(1,1,1)
15055: (2,2,2) #020202 srgb(2,2,2)
212826: (3,3,3) #030303 grey1
24467: (4,4,4) #040404 srgb(4,4,4)
5004: (5,5,5) #050505 grey2
896: (6,6,6) #060606 srgb(6,6,6)
237: (7,7,7) #070707 srgb(7,7,7)
113: (8,8,8) #080808 grey3
72: (9,9,9) #090909 srgb(9,9,9)
43: (10,10,10) #0A0A0A grey4
24: (11,11,11) #0B0B0B srgb(11,11,11)
14: (12,12,12) #0C0C0C srgb(12,12,12)
7: (13,13,13) #0D0D0D grey5
5: (14,14,14) #0E0E0E srgb(14,14,14)
3: (15,15,15) #0F0F0F grey6
1: (16,16,16) #101010 srgb(16,16,16)
3: (19,19,19) #131313 srgb(19,19,19)
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Background color: white
Border color: srgb(223,223,223)
Matte color: grey74
Transparent color: black
Interlace: None
Intensity: Undefined
Compose: Over
Page geometry: 512x512+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 90
Orientation: Undefined
Properties:
date:create: 2022-09-07T00:51:18+00:00
date:modify: 2022-09-07T00:51:18+00:00
jpeg:colorspace: 2
jpeg:sampling-factor: 2x2,1x1,1x1
signature: ebb4af08227671b45fa62c44887f9b94a8a17d3a7d6c418c26be0e032b766359
Artifacts:
filename: 00-zzi3ZROdz3Lq8hTj7hy2ghoChBAv2D2-9bU_jPT-D4b_jXTraQfK81DEuQ=x0-y0-z2.jpg
verbose: true
Tainted: False
Filesize: 3764B
Number pixels: 262144
Pixels per second: 67.2793MB
User time: 0.000u
Elapsed time: 0:01.003
Version: ImageMagick 6.9.11-60 Q16 x86_64 2021-01-25 https://imagemagick.org
Hi Mark, thank you for your help so far. I have been doing some testing and very nearly there! I had to change the get list of images code to use egrep due to it not finding the files, i have changed to:
row=( $(ls | egrep *-y${y}-z 2> /dev/null) )
The final hurdle, is that when attempting to process a smaller directory of 10 folders as a test of parallel processsing,
find "tiled_images" -type d -print ./processOne {}
It seems to not be printing the folder names after the command and showing:
find: paths must precede expression.
As I see it, there are two aspects to this:
IMHO, the best way to process 30,000 directories is in parallel, else you'll be there all day. So I would suggest to write the processing as a script that does one directory, passed as a single parameter, and then using a GNU Parallel job that processes all 30,000 directories, keeping all your CPU cores busy till all directories are done.
So, if your directories are under a top-level directory called "tiled_images", and you save the script in the next part of my answer as processOne.sh
, you could do this:
find "tiled_images" -type d -print | parallel ./processOne {}
There are many options to GNU Parallel, here are a few of the most useful:
parallel --eta ...
will show you the "Estimated Time of Arrival" of job completion
parallel --bar ...
will show you a progress bar, and works with zenity
parallel --j 4 ...
will run just 4 jobs at a time
parallel --j 50% ...
will keep half your CPU cores busy
Now to the processing of a single directory, whose name is passed as parameter:
#!/bin/bash
# Expect one parameter - the directory name
[ $# -ne 2 ] || { >&2 echo "Usage: $0 DIRECTORY"; exit 1; }
d=$1
cd "$d" || { >&2 echo "ERROR: Directory $d does not exist"; exit 1; }
# Assume no more than 100 rows of tiles since fewer than 50 images altogether, and presumably more than 1 image per row
for ((y=0;y<100;y++)) ; do
# Get list of images in this row
row=( $(ls *-y${y}-z2.jpg 2> /dev/null) )
# Break out of loop if no images
[ -z "$row" ] && break
# Formulate output filename for this row, being sure that it is zero-padded so the rows collate in correct order
# Also, write to MPC, or Magick Pixel Cache format, which should be fastest to write and read later
printf -v out "row-%02d.mpc" $y
echo "Processing row: ${y}"
echo " concatenating: ${row[@]}"
echo " into: ${out}"
magick "${row[@]}" +append "$out"
done
# Concatenate rows into result
magick row-*mpc -append result.jpg
# You should clean up here when it is tested
# rm *.mpc *.cache
You would then save this as processOne.sh
and make it executable with:
chmod +x processOne.sh
Then test it on a single directory with:
./processOne SOME_DIRECTORY_CONTAINING_TILES
Note that +append
concatenates images side-by-side, whereas -append
(different sign) concatenates images above-and-below each other.
Note that you could speed this up by avoiding creation of intermediate files, but that might complicate things and make debugging harder. Just for reference, that would look something very much like this:
...
...
# Assume no more than 100 rows of tiles since fewer than 50 images altogether, and presumable more than 1 image per row
for ((y=0;y<100;y++)) ; do
# Get list of images in this row
row=( $(ls *-y${y}-z2.jpg 2> /dev/null) )
# Break out of loop if no images
[ -z "$row" ] && break
# Append images to make a row and pass to outer `magick` command
magick "${row[@]}" +append miff:-
done | magick miff:- -append result.jpg