Search code examples
bashpattern-matchingfilenamesglob

Bash to match pattern in filename then add/edit


I'm sure this has been answered before, but I can't seem to use the right search terms to find it.

I'm trying to write a bash script that can recognize, sort, and rename files based on patterns in their names.

Take this filename, for example: BBC Something Something 3 of 5 Blah 2007.avi

I would like the script to recognize that since the filename starts with BBC and contains something that matches the pattern "DIGIT of DIGIT," the script should rename it by removing the BBC at the front, inserting the string "s01e0" in front of the 3, and removing the "of 5," turning it into Something Something s01e03 Blah 2007.avi

In addition, I'd like for the script to recognize and deal differently with a file named, for example, BBC Something Else 2009.mkv . In this case, I need the script to recognize that since the filename starts with BBC and ends with a year, but does not contain that "DIGIT of DIGIT" pattern, it should rename it by inserting the word "documentaries" after BBC and then copying and pasting the year after that, so that the filename would become BBC documentaries 2009 Something Else.mkv

I hope this isn't asking for too much help... I've been working on this myself all day, but this is literally all I've got:

topic1 () {
if [ "$2" = "bbc*[:digit:] of [:digit:]" ]; then

And then nothing. I'd love some help! Thanks!


Solution

  • Use grep to match filenames that need to be changed and then sed to actually change them:

    #!/bin/bash
    
    get_name()
    {
        local FILENAME="${1}"
        local NEWNAME=""
    
        # check if input matches our criteria
        MATCH_EPISODE=$(echo "${FILENAME}" | grep -c "BBC.*[0-9] of [0-9]")
        MATCH_DOCUMENTARY=$(echo "${FILENAME}" | grep -c "BBC.*[0-9]\{4\}")
    
        # if it matches then modify
        if [ "${MATCH_EPISODE}" = "1" ]; then
    
            NEWNAME=$(echo "${FILENAME}" | sed -e 's/BBC\(.*\)\([0-9]\) of [0-9]\(.*\)/\1 s01e0\2 \3/')
    
        elif [ "${MATCH_DOCUMENTARY}" = "1" ]; then
    
            NEWNAME=$(echo "${FILENAME}" | sed -e 's/BBC\(.*\)\([0-9]\{4\}\)\(.*\)/BBC documentaries \2 \1 \3/')
    
        fi
    
        # clean up: remove trailing spaces, double spaces, spaces before dot
        echo "${NEWNAME}" | sed -e 's/^ *//' -e 's/  / /g' -e 's/ \./\./g'
    }
    
    FN1="BBC Something Something 3 of 5 Blah 2007.avi"
    FN2="BBC Something Else 2009.mkv"
    FN3="Something Not From BBC.mkv"
    
    NN1=$(get_name "${FN1}")
    NN2=$(get_name "${FN2}")
    NN3=$(get_name "${FN3}")
    
    echo "${FN1} -> ${NN1}"
    echo "${FN2} -> ${NN2}"
    echo "${FN3} -> ${NN3}"
    

    The output is:

    BBC Something Something 3 of 5 Blah 2007.avi -> Something Something s01e03 Blah 2007.avi
    BBC Something Else 2009.mkv -> BBC documentaries 2009 Something Else.mkv
    Something Not From BBC.mkv -> 
    

    Let's see at one of sed invocations:

    sed -e 's/BBC\(.*\)\([0-9]\) of [0-9]\(.*\)/\1 s01e0\2 \3/'
    

    We use capture groups to match interesting portions of the filename:

    • BBC - match literal BBC,
    • \(.*\) - match everything and remember it in capture group 1, until
    • \([0-9]\) - a digit, remember it in capture group 2, then
    • of [0-9] - match literal " of " and digit,
    • \(.*\) - match rest and remember it in capture group 3

    and then put them in positions we want:

    • \1 - content of capture group 1, i.e. everything between "BBC" and first digit
    • s01e0 - literal " s01e0"
    • \2 - content of capture group 2, i.e. episode number
    • \3 - content of capture group 3, i.e. everything else

    This may result in many superfluous spaces so at the end there is another sed invocation to clean that up.