Search code examples
bashpattern-matchingbatch-rename

rename multiple folders under different conditions by matching regex patterns in a bash script?


As an example, let's say I have a folder containing these folders:

Universal 2023 02 15 Some Name
Universal 2023 02 15 Some Name and Words After
Sony Some Name 2023 02 15
Sony Some Name 2023 02 15 and Words After

Desired output

Some Name - 2023 02 15 - Universal
Some Name - 2023 02 15 - And Words After - Universal
Some Name - 2023 02 15 - Sony
Some Name - 2023 02 15 - and Words After – Sony

I wrote a command for every name structure.

1. « Universal 2023 02 15 Some Name » will be renamed: « Some Name - 2023 02 15 - Universal » With this command:

rename -v 's/([\s\S]+)\s((\d{4})\s(\d{2})\s(\d{2}))\s(([\s\S]+)\s([\s\S]+))/$6 - $2 - $1/g' *

« Universal 2023 02 15 Some Name and Words After » will be renamed: « Some Name - 2023 02 15 - And Words After - Universal » With this command:

rename -v 's/([\s\S]+)\s((\d{4})\s(\d{2})(\s)(\d{2}))\s((\w+)\s(\w+))\s([\s\S]+)/$7 - $2 - $10 - $1/g' *

« Sony Some Name 2023 02 15 » will be renamed : « Some Name - 2023 02 15 - Sony » With this command :

rename -v 's/([\s\S]+)\s((\w+)\s(\w+))\s((\d{4})\s(\d{2})\s(\d{2}))/$2 - $5 - $1/g' *
  1. Finally, « Sony Some Name 2023 02 15 and Words After » will be renamed : « Some Name - 2023 02 15 - and Words After - Sony » With this command :
rename -v 's/([\s\S]+)\s((\d{4})\s(\d{2})\s(\d{2}))\s(([\w]+)\s([\w]+))\s([\s\S]+)/$6 - $2 - $9 - $1/g' *

When I want to rename these folders, I have to put them in separate folders and run the corresponding command, then put them all back in the same folder when I'm done. This is very annoying. So I thought of writing a script in bash to avoid having to file them separately and have everything done in the main folder. In the VS code, everything seems to work fine except for the renaming commands. This line is colored orange... Which means that something is missing but I don't know what it is:

's/([\s\S]+)\s((\d{4})\s(\d{2})\s(\d{2}))\s(([\w]+)\s([\w]+))\s([\s\S]+)/$6 - $2 - $9 - $1/g'

See this link to view the scipt in VS code colors: https://i.sstatic.net/tosSv.png

My script :

for i in $*/; do
        # for Universal 2023 02 15 Some Name
        if [[ "$i" =~ ([\s\S]+)\s((\d{4})\s(\d{2})\s(\d{2}))\s(([\s\S]+)\s([\s\S]+)) ]];
                then
                        rename -v 's/([\s\S]+)\s((\d{4})\s(\d{2})\s(\d{2}))\s(([\s\S]+)\s([\s\S]+))/$6 - $2 - $1/g' *
        
        # for Universal 2023 02 15 Some Name and Words After
        elif [[ "$i" =~ ([\s\S]+)\s((\d{4})\s(\d{2})(\s)(\d{2}))\s((\w+)\s(\w+))\s([\s\S]+) * ]];
                then
                        rename -v 's/([\s\S]+)\s((\d{4})\s(\d{2})(\s)(\d{2}))\s((\w+)\s(\w+))\s([\s\S]+)/$7 - $2 - $10 - $1/g' *

        # for Sony Some Name 2023 02 15
        elif [[ "$i" =~ ([\s\S]+)\s((\w+)\s(\w+))\s((\d{4})\s(\d{2})\s(\d{2})) ]];
                then
                        rename -v 's/([\s\S]+)\s((\w+)\s(\w+))\s((\d{4})\s(\d{2})\s(\d{2}))/$2 - $5 - $1/g' *
        
        # for Sony Some Name 2023 02 15 and Words After
        else [[ "$i" =~ ([\s\S]+)\s((\w+)\s(\w+))\s((\d{4})\s(\d{2})\s(\d{2})) ]];
                then
                        rename -v 's/([\s\S]+)\s((\d{4})\s(\d{2})\s(\d{2}))\s(([\w]+)\s([\w]+))\s([\s\S]+)/$6 - $2 - $9 - $1/g' *
        
        fi

done

the script in color for VS code. My commands are all orange...

Anyone can help me please!!!!!!!! Many Thanks! Martin


Solution

  • Try this Shellcheck-clean code:

    #! /bin/bash -p
    
    sep_rx='[[:space:]]+'
    part_rx='[^[:space:]]+'
    company_rx=$part_rx
    name_rx="${part_rx}${sep_rx}${part_rx}"
    date_rx="[[:digit:]]{4}${sep_rx}[[:digit:]]{2}${sep_rx}[[:digit:]]{2}"
    after_rx="${part_rx}(${sep_rx}${part_rx})*"
    
    cdn_rx="^($company_rx)$sep_rx($date_rx)$sep_rx($name_rx)\$"
    cdna_rx="^($company_rx)$sep_rx($date_rx)$sep_rx($name_rx)$sep_rx($after_rx)\$"
    cnd_rx="^($company_rx)$sep_rx($name_rx)$sep_rx($date_rx)\$"
    cnda_rx="^($company_rx)$sep_rx($name_rx)$sep_rx($date_rx)$sep_rx($after_rx)\$"
    
    for d in */; do
        dir=${d%/}
        if [[ $dir =~ $cdn_rx ]]; then
            company=${BASH_REMATCH[1]}
            date=${BASH_REMATCH[2]}
            name=${BASH_REMATCH[3]}
            newdir="$name - $date - $company"
        elif [[ $dir =~ $cdna_rx ]]; then
            company=${BASH_REMATCH[1]}
            date=${BASH_REMATCH[2]}
            name=${BASH_REMATCH[3]}
            words_after=${BASH_REMATCH[4]}
            newdir="$name - $date - $words_after - $company"
        elif [[ $dir =~ $cnd_rx ]]; then
            company=${BASH_REMATCH[1]}
            name=${BASH_REMATCH[2]}
            date=${BASH_REMATCH[3]}
            newdir="$name - $date - $company"
        elif [[ $dir =~ $cnda_rx ]]; then
            company=${BASH_REMATCH[1]}
            name=${BASH_REMATCH[2]}
            date=${BASH_REMATCH[3]}
            words_after=${BASH_REMATCH[4]}
            newdir="$name - $date - $words_after - $company"
        else
            printf 'ERROR: Failed to match: %s\n' "$dir" >&2
            exit 1
        fi
        mv -v -- "$dir" "$newdir"
    done
    
    • The long, and duplicated, regular expressions in the original code are very difficult to read, so I've tried to break them down into named parts.
    • Regular expression extensions such as \s, \S, and \d don't work consistently with =~ in Bash, so I've used portable character classes instead (e.g. [^[:space:]] for \S).
    • See mkelement0's excellent answer to How do I use a regex in a shell script? to learn more about using regular expressions in Bash code.
    • See Bash Pitfalls #35 (if [[ $foo =~ 'some RE' ]]) for an explanation of why I put all the regular expressions in variables.
    • The rename utility isn't available on all systems, and there are at least two very different versions of it in circulation, so I've used the standard mv utility instead. See Why is the rename utility on Debian/Ubuntu different than the one on other distributions, like CentOS?.
    • The code works on the given examples, but it may well fail on other directory names. You'll need to check the regular expressions and modify them as necessary.
    • The -p in the #! /bin/bash -p shebang prevents Bash from reading configuration files and environment variables that could change how it behaves (e.g. by defining functions that override standard utilities, or by defining environment variables that make standard utilities behave in non-standard ways). It makes Bash programs more reliable, and reduces the "it works on my machine" effect. It may also avoid some security issues (see Shell Script Security - Apple Developer).
    • The parentheses in regular expression strings like "^($company_rx)$sep_rx($date_rx)$sep_rx($name_rx)\$" delimit "capture groups". Matches for regular expression parts between parentheses are copied into the BASH_REMATCH array. For instance, the second set of parentheses in the given string surround the date pattern, so the matched date is copied into index 2 in BASH_REMATCH (${BASH_REMATCH[2]}). The \$ at the end of the reqular expression is a backslash-escaped literal dollar character, which is a regular expression metacharacter matching the end of the string being matched. See POSIX Extended Regular Expressions for a full description of the Bash regular expressions. (Though some implementations, inconsistently, support extensions like \s etc.) The backslash in \$ is to prevent the dollar causing an expansion (which it normally does within double quotes).
    • The */ in for d in */ expands to the list of slash-terminated names (excluding names beginning with the dot character (.)) of directories under the current directory. See glob - Greg's Wiki.
    • dir=${d%/} causes dir to get the value of d with a trailing slash removed. See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)).
    • printf 'ERROR: Failed to match: %s\n' "$dir" prints the string ERROR: Failed to match: %s with a trailing newline and with %s replaced by the value of dir. It is a safer version of echo "ERROR: Failed to match: $dir", which doesn't work in general. See the accepted, and excellent, answer to Why is printf better than echo? for more information. See the POSIX printf page for detailed information about the printf utility.
    • The >&2 at the end of printf 'ERROR: ...' "$dir" >&2 causes the output to go to the "standard error" stream ("stderr") instead of the "standard output" stream ("stdout"). One practical consequence of this is that the error message will be visible even if the (standard) output of the program is redirected. It is normal to do that for error messages, and other diagnostic messages (warning, debugging, ...). See BashGuide/InputAndOutput - Greg's Wiki (wooledge.org).
    • The -- in mv -v -- "$dir" "$newdir" is to ensure that there will not be a problem if the code is ever used with names that begin with hyphen/dash (-), even if the code is copied into a different program. Without the -- leading hyphens would cause the directory names to be interpreted as strings of options to mv. See Bash Pitfalls #2 (cp $file $target) and Bash Pitfalls #3 (Filenames with leading dashes).