I'm learning shell scripting, and am striving to remain as POSIX compliant as possible while keeping the code-base somewhat readable. The goal is to read a list of files from directory A, find their matches from directory B, and recreate a portion of the directory parent B in directory C where the files from directory A should be moved, then remove the matched/moved files from directory B, and if the directories are then empty from directory B files found, remove them. All files in directory A will always be unique to each other, and there will always be one or more matches from directory B and never a match in directory C, but the sub-directories in directory C may already be present to match from directory B. All files matched in Directory B should be removed after matches are moved from Directory A to Directory C. Extensions change as files are processed separately, but filenames will otherwise match exactly. Filenames may contain spaces and periods. Filenames will not always be the same length. There are two levels of sub-directories in the output and archive directories.
Here's what I've got so far. I'm getting stuck on writing the for-loop to do the dirty work. Trying not to step too far outside of find, printf, awk, grep, for, and if.
#!/bin/sh
execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes=("jpg", "svg", "bmp", "tiff", "psd")
$folderNew="$baseDir/$folderNew"
$folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"
if [ "$(uname -n)" = "$execHome" ]
then
count=$(find $folderNew -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"
find $folderNew -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'
else
printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
fi
Example:
Directory A
/home/library/projects/output/projectOne 1.a.png
/home/library/projects/output/projectOne 1.b.png
/home/library/projects/output/projectOne 1.c.png
/home/library/projects/output/projectThree 3.m.png
/home/library/projects/output/projectThree 3.o.png
/home/library/projects/output/projectFour 4.t.png
/home/library/projects/output/projectFour 4.u.png
Directory B
/home/library/projects/working/House/2018 01/projectOne 1.a.jpg
/home/library/projects/working/House/2018 01/projectOne 1.a.svg
/home/library/projects/working/House/2018 01/projectOne 1.b.jpg
/home/library/projects/working/House/2018 01/projectOne 1.b.svg
/home/library/projects/working/House/2018 01/projectOne 1.c.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.svg
/home/library/projects/working/House/2018 02/projectTwo 2.h.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.h.svg
/home/library/projects/working/House/2018 02/projectTwo 2.i.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.m.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.n.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.o.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.o.svg
/home/library/projects/working/Car/2018 04/projectFour 4.s.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.t.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.u.jpg
Directory C
/home/library/projects/archive/House/2018 01/projectOne 1.d.png
/home/library/projects/archive/House/2018 01/projectOne 1.e.png
/home/library/projects/archive/House/2018 01/projectOne 1.f.png
/home/library/projects/archive/Car/2018 03/projectThree 3.p.png
/home/library/projects/archive/Car/2018 03/projectThree 3.q.png
/home/library/projects/archive/Car/2018 03/projectThree 3.r.png
Desired outcome:
Directory A files have been moved to Directory C
/home/library/projects/output/
Directory B should have Directory A files removed and empty folders deleted
/home/library/projects/working/House/2018 02/projectTwo 2.g.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.svg
/home/library/projects/working/House/2018 02/projectTwo 2.h.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.h.svg
/home/library/projects/working/House/2018 02/projectTwo 2.i.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.n.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.s.jpg
Directory C should contain both old archives and new output files as archives
/home/library/projects/archive/House/2018 01/projectOne 1.a.png
/home/library/projects/archive/House/2018 01/projectOne 1.b.png
/home/library/projects/archive/House/2018 01/projectOne 1.c.png
/home/library/projects/archive/House/2018 01/projectOne 1.d.png
/home/library/projects/archive/House/2018 01/projectOne 1.e.png
/home/library/projects/archive/House/2018 01/projectOne 1.f.png
/home/library/projects/archive/Car/2018 03/projectThree 3.m.png
/home/library/projects/archive/Car/2018 03/projectThree 3.o.png
/home/library/projects/archive/Car/2018 03/projectThree 3.p.png
/home/library/projects/archive/Car/2018 03/projectThree 3.q.png
/home/library/projects/archive/Car/2018 03/projectThree 3.r.png
/home/library/projects/archive/Car/2018 04/projectFour 4.t.png
/home/library/projects/archive/Car/2018 04/projectFour 4.u.png
Ran the code anyway from a bash 4.4.19 machine to see how it does, but it didn't work quite like I expected. Here's the resultant output:
Found/processing 4 files in the /home/library/projects/output folder
./auto-archive.sh: line 34: hash["$proj"]: bad array subscript
parent of /home/library/projects/output/.temp/projectThree 3.m.png not found
parent of /home/library/projects/output/projectOne 1.a.png not found
parent of /home/library/projects/output/.temp/projectThree 3.0.png not found
parent of /home/library/projects/output/projectFour 4.t.png not found
My apologies. I also didn't mention earlier that Directory B should not be scanned recursively, which in the use-case yields other temporary files that are being written, but may not yet be ready to move. Also, for the purposes of testing, only the four files listed above were actually in Directory A; not all the files listed initially. Further, after recreating the proposed test structure, your code seems to execute flawlessly; not matching the results from my actual file structure. I fear I may have missed some crucial element in describing my actual file structure/naming convention. Reviewing now for descriptor differences. Sorry to be taking time away, but certainly impressed with your accuracy. Feels like we're getting close, but definitely need to run on earlier version of bash.
The task will be divided into three steps:
To create a map which associates each filename (project name) to its parent directory name in C. This is performed as a preparation stage by analyzing pathnames in B. We will make use of an associative array and the bash version must be 4.2 or newer.
To loop over the files in A, compose a path name to be stored in C by using the map created in the 1st step, and remove files in B.
As a clean-up stage, we remove empty directories in B, if any.
Then how about:
#!/bin/bash
execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes=("jpg" "svg" "bmp" "tiff" "psd")
declare -A hash
folderNew="$baseDir/$folderNew"
folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"
if [ "$(uname -n)" != "$execHome" ]; then
printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
exit
fi
count=$(find "$folderNew" -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"
# determine parent directory name for each project name and create a map for them
while IFS= read -r -d $'\0' f; do
proj="${f##*/}" # remove dirname
proj="${proj%.*}" # remove extention
parent="${f##*$baseDir/}" # remove pathname until $baseDir
parent="${parent#*/}" # strip pathname one-level deeper
parent="${parent%/*}" # remove filename
# now we're mapping "projectOne 1.a" => "House/2018 01" e.g.
# echo "$proj" "=>" "$parent" # just for debugging
hash["$proj"]="$parent"
done < <(find "$folderOld" -type f -print0) # directory B
# iterate over files in A; move to archive directory C and remove files in B
while IFS= read -r -d $'\0' f; do
proj="${f##*/}"
proj="${proj%.*}"
parent="${hash[$proj]}"
if [[ "$parent" = "" ]]; then
echo "parent of $f not found" # may not occur but just in case ..
else
# move from A to C
destdir="$folderArchive/$parent"
mkdir -p -- "$destdir"
mv -- "$f" "$destdir"
# remove relevant file(s) in B
for ext in "${workingTypes[@]}"; do
oldfile="$folderOld/$parent/$proj.${ext}"
[ -f "$oldfile" ] && rm -f -- "$oldfile"
done
fi
done < <(find "$folderNew" -type f -print0) # directory A
# clean-up: remove empty dirs in B
find "$folderOld" -type d -empty -print0 | xargs -r -0 rmdir --
Explanations:
$
prior to the variable name on the left-hand side.while IFS= ... done < <(find ...)
syntax is an idiom to loop over the output of find
.${parameter#word}
type of syntax is a parameter expansion
to extract a substring from the path.hash
maps each project name, such as "projectOne 1.a" to its parent directory name, such as "House/2018 01".--
s in some commands are to prepare for the filenames which may start with -
. (this protection may look pathological...)If your bash is older than 4.2, let me know. Then we need to find an alternative.
EDIT
Here's the POSIX compliant version as an alternative:
(Apparently the script does not work if the filenames contain a newline or an escape character \x1b
.)
#!/bin/sh
execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes="jpg
svg
bmp
tiff
psd"
folderNew="$baseDir/$folderNew"
folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"
nl="
" # set to newline character
esc=$(/bin/echo -ne "\033") # set to escape character
#esc=":" # if \033 does not work well, try another character
# substitute of reading a hash
# it relies on the context that IFS is set to $nl
read_lut() {
local i
local key
local val
local ret=""
for i in $lut; do
key="${i%${esc}*}"
val="${i#*${esc}}"
if [ "$key" = "$1" ]; then
# loop until the end and use the last value
ret="$val"
fi
done
echo "$ret"
}
# substitute of writing to a hash
write_lut() {
lut=$(printf "%s\n%s%c%s" "$lut" "$1" "$esc" "$2")
}
if [ "$(uname -n)" != "$execHome" ]; then
printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
exit
fi
count=$(find "$folderNew" -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"
# determine parent directory name for each project name and create a map for them
ifs_bak="$IFS"
IFS="$nl"
for f in $(find "$folderOld" -type f); do
proj="${f##*/}" # remove dirname
proj="${proj%.*}" # remove extention
parent="${f##*$baseDir/}" # remove pathname until $baseDir
parent="${parent#*/}" # strip pathname one-level deeper
parent="${parent%/*}" # remove filename
# now we're mapping "projectOne 1.a" => "House/2018 01" e.g.
# echo "$proj" "=>" "$parent" # just for debugging
write_lut "$proj" "$parent"
done
# iterate over files in A; move to archive directory C and remove files in B
for f in $(find "$folderNew" -type f); do
proj="${f##*/}"
proj="${proj%.*}"
parent=$(read_lut "$proj")
if [ "$parent" = "" ]; then
echo "parent of $f not found" # may not occur but just in case ..
else
# move from A to C
destdir="$folderArchive/$parent"
mkdir -p -- "$destdir"
mv -- "$f" "$destdir"
# remove relevant file(s) in B
for ext in $workingTypes; do
oldfile="$folderOld/$parent/$proj.${ext}"
[ -f "$oldfile" ] && rm -f -- "$oldfile"
done
fi
done
# clean-up: remove empty dirs in B
find "$folderOld" -type d -empty -print0 | xargs -r -0 rmdir --
# restore IFS
IFS="$ifs_bak"