Search code examples
bashsh

bash or shell script remove subdirectories but keep all nested data


I have a deeply nested folder structure that and I want to reduce the nesting by two levels by removing the sub-directories completely, but bringing all the data under them outside to the parent folder Example1: path:

folder1/folder2/id=123/date_from=2022-10-12/date_to=2023-10-12/sys_date_from=12%1A22%3A19.489/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=watermelon/binary_file_name.parquet

path I want after removing level two levels-

folder1/folder2/id=123/date_from=2022-10-12/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=watermelon/binary_file_name.parquet

Here I have removed the two level date_to=2023-10-12 and sys_date_from=12%1A22%3A19.489 from the path, but no data is lost. Whatever was under them I got that outside to their parent folder.

The folde2 and the value after equal to(=) in the nested folders keep changing.

For example I can also have:

folder1/folder10/id=456/date_from=2021-04-01/date_to=2022-03-11/sys_date_from=08%8G22%3B19.909/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=apple/binary_file_name2.parquet

after removing the date_to and sys_date_from folders my file struction should look like this:

folder1/folder10/id=456/date_from=2021-04-01/sys_date_to=8888-12-31 23%3A59%3A59.999/slice=apple/binary_file_name2.parquet

I looked at getting this wokring via the find command, but struggling to get anything coherent. How do I do this via bash command or shell scripting? I want to do this in an automated fashion to all folders under folder1. Could be as many as ten folders under folder1.


Solution

  • #!/bin/bash
    (
        shopt -s dotglob nullglob
        for top in folder1/folder*/id=*/date_from=* ; do
            pushd "$top"
            for sub in date_to=*/sys_date_from=* ; do
                mv "$sub"/* .
                rmdir "$sub" "${sub%/*}"
            done
            popd
        done
    )
    
    • enable nullglob so globs matching nothing return nothing - makes glob expansion return a (possibly-empty) list of paths that all exist (by default, x* as output could be an actual path (character "x" followed by literal asterisk), or it might indicate a glob expansion failure)
    • enable dotglob so that during glob expansion, a leading * in a path segment will match names that start with .
    • loop over list of all paths generated by expanding the first glob (top)
      • pushd(popd) - change directory into(out of) top to allow use of shorter relative paths in inner loop
      • loop over list of paths generated by expanding the second glob (sub) (relative to directory top)
        • move all contents of sub to current directory (ie. top)
        • delete sub and its parent - ${sub%/*} strips the final path element

    The surrounding parentheses ( ... ) aren't really needed; they localise the shopt changes if the program is extended, and insulate code appended after them from pushd/popd failure (which should be checked). They also allow testing the code by cut&paste to an interactive shell without permanently changing its settings.


    Assumes date_to=2023-10-12 only contains sys_date_from=12%1A22%3A19.489; it can't be deleted if it isn't empty.

    If date_to=2023-10-12 contains multiple sys_date_from=[...] subdirectories, it will eventually get deleted, but rmdir will give error messages. These can be avoided by nesting another level. eg: for sub1 in date_to=*; do for sub2 in "$sub1"/sys_date_from=*; ...

    Also assumes file permissions allow the pushd and mv - adding appropriate error-checking is advisable.

    Also add checks for overwriting unless the subdirectory contents are guaranteed to have unique names when moved.