Search code examples
regexshellosx-mountain-lion

Checking/ moving files based on regex


Im trying to write a shell script on a mac that will move raw camera files. These files need to be named very specifically. I have the regex to check the file names against, I've just had zero luck making it work correctly. What does work correctly is looking inside a specific folder for a folder name containing the raw files and getting a list of the files.

Im also trying to error check. I've been trying to use if statements to check the filenames with.

I'm needing help writing the if statement to check if the files are named correctly.

I would greatly appreciate any help as I'm completely stuck at this point.

Here's what I have so far:

#!/bin/bash

product="^[A-Z0-9]{2}\w[A-Z0-9]{6,7}\w[A-Z]{1}\.(EIP)"

#folder of files to check
folder_files="$(ls -d *)"

#just get a list of everything .EIP
FILES_LIST="$(ls *.EIP)"

for file in $FILES_LIST; do
#something with $file

echo $file

#where im having the trouble 
If (grep or find based on $product)
then

    #move files, create log

else

    #move files to an error folder for renaming

fi

done
exit 0

Solution

  • The curly brackets are part of extended regular extension (ERE) syntax, not basic regular expression (BRE) syntax so we need to use "egrep". I also took the liberty of removing the brackets from your regex as I see that you are looking for files ending in .EIP so this leaves us with:

    product="^[A-Z0-9]{2}\w[A-Z0-9]{6,7}\w[A-Z]{1}\.EIP"
    

    We will also need to change the $IFS variable, as it's used by the FOR loop to determine the field separator. By default the field separator is set to the space character, which does not work great for strings where the field separator can be part of the string(i.e. if the filenames contain spaces). We store the current value of IFS to a variable and we set the IFS:

    SAVEIFS=$IFS
    IFS=$(echo -en "\n\b")
    

    When we are done we are going to restore the IFS to its original value:

    IFS=$SAVEIFS
    

    Now we will pipe the filename to egrep and filter using our regex while redirecting both stdout and stderr to /dev/null. The $? variable will let us know if our egrep returned a match.

    echo $file | egrep $product &>/dev/null
    if [ $? -eq 0 ]; then 
      echo "$file - acceptable"
    else 
      echo "$file - not acceptable"
    fi
    

    Here is how the complete script looks like (tested on mountain lion):

    #!/bin/bash 
    
    product="^[A-Z0-9]{2}\w[A-Z0-9]{6,7}\w[A-Z]{1}\.EIP"
    
    FILES_LIST="$(ls *.EIP)"
    
    SAVEIFS=$IFS
    IFS=$(echo -en "\n\b")
    
    for file in $FILES_LIST; do
      echo $file | egrep $product &>/dev/null
      if [ $? -eq 0 ]; then 
        echo "$file - acceptable"
        #move files, create log
      else 
        echo "$file - not acceptable"
        #move files to an error folder for renaming
      fi
    done
    
    IFS=$SAVEIFS
    
    exit 0
    

    Note you can check for compliance to N naming conventions by using multiple blocks of if statements and only one else condition at the very end as shown below:

    for file in $FILES_LIST; do
    
      echo $file | egrep $regex1 &>/dev/null
      if [ $? -eq 0 ]; then 
        echo "$file - accepted by regex1"
        #move files, create log
        continue
      fi
    
      echo $file | egrep $regex2 &>/dev/null
      if [ $? -eq 0 ]; then 
        echo "$file - accepted by regex2"
        #move files, create log
        continue
      fi
    
      echo $file | egrep $regexN &>/dev/null
      if [ $? -eq 0 ]; then 
        echo "$file - accepted by regexN"
        #move files, create log
      else 
        echo "$file - not acceptable"
        #move files to an error folder for renaming
      fi
    done
    

    Note the use of continue as it resumes the iteration of the for loop, allowing for only a single action to be taken per file (think of filenames compliant to more than 1 naming convention)