I have a deeply nested folder structure that and I want to reduce the nesting by two levels by removing the sub-directories completely, but bringing all the data under them outside to the parent folder Example1: path:
folder1/folder2/id=123/date_from=2022-10-12/date_to=2023-10-12/sys_date_from=12%1A22%3A19.489/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=watermelon/binary_file_name.parquet
path I want after removing level two levels-
folder1/folder2/id=123/date_from=2022-10-12/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=watermelon/binary_file_name.parquet
Here I have removed the two level date_to=2023-10-12 and sys_date_from=12%1A22%3A19.489 from the path, but no data is lost. Whatever was under them I got that outside to their parent folder.
The folde2 and the value after equal to(=) in the nested folders keep changing.
For example I can also have:
folder1/folder10/id=456/date_from=2021-04-01/date_to=2022-03-11/sys_date_from=08%8G22%3B19.909/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=apple/binary_file_name2.parquet
after removing the date_to and sys_date_from folders my file struction should look like this:
folder1/folder10/id=456/date_from=2021-04-01/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=apple/binary_file_name2.parquet
I looked at getting this wokring via the find command, but struggling to get anything coherent. How do I do this via bash command or shell scripting? I want to do this in an automated fashion to all folders under folder1. Could be as many as ten folders under folder1.
#!/bin/bash
(
shopt -s dotglob nullglob
for top in folder1/folder*/id=*/date_from=* ; do
pushd "$top"
for sub in date_to=*/sys_date_from=* ; do
mv "$sub"/* .
rmdir "$sub" "${sub%/*}"
done
popd
done
)
nullglob
so globs matching nothing return nothing - makes glob expansion return a (possibly-empty) list of paths that all exist (by default, x*
as output could be an actual path (character "x" followed by literal asterisk), or it might indicate a glob expansion failure)dotglob
so that during glob expansion, a leading *
in a path segment will match names that start with .
top
)
pushd
(popd
) - change directory into(out of) top
to allow use of shorter relative paths in inner loopsub
) (relative to directory top
)
sub
to current directory (ie. top
)sub
and its parent - ${sub%/*}
strips the final path elementThe surrounding parentheses (
... )
aren't really needed; they localise the shopt
changes if the program is extended, and insulate code appended after them from pushd
/popd
failure (which should be checked). They also allow testing the code by cut&paste to an interactive shell without permanently changing its settings.
Assumes date_to=2023-10-12
only contains sys_date_from=12%1A22%3A19.489
; it can't be deleted if it isn't empty.
If date_to=2023-10-12
contains multiple sys_date_from=[...]
subdirectories, it will eventually get deleted, but rmdir
will give error messages. These can be avoided by nesting another level. eg: for sub1 in date_to=*; do for sub2 in "$sub1"/sys_date_from=*; ...
Also assumes file permissions allow the pushd
and mv
- adding appropriate error-checking is advisable.
Also add checks for overwriting unless the subdirectory contents are guaranteed to have unique names when moved.