Search code examples
linuxbashgrep

extract parts of filenames with regex in bash


I want to extract some information from filenames using regex in bash, which I will use to rename them according to BIDS. Here are the filenames:

ACC_svs_ECC.nii.gz
ACC_svs_noECC.nii.gz
ACC_svs_ref.nii.gz
Lt Hippocampus_svs_ECC.nii.gz
Lt Hippocampus_svs_noECC.nii.gz
Lt Hippocampus_svs_ref.nii.gz

Those are literally the only possibilities for filenames, not a pattern or example. all the filenames. each participant will have those 6 for two of their sessions, but the filenames will be the same until I change them to be BIDS compliant. The json files are sidecar files, so the filename is exactly the same as the .nii.gz except .json files.

Each filename has a brain region (ACC or Lt Hippocampus) and a type of mrs file (ECC, noECC, and ref). Those are the information that I will need in the new filenames, so as I loop through each existing filename I'd like to use them in the new filename.

Here is what the code will look like:

#!/bin/bash

ID=$1 # participant ID= user input
ses=$2 # session no (MRI)= user input
bidsdir=path/path/sub-001/ses-MRI1/mrs/ # path to mrs folder as specified by BIDS

for file in "${bidsdir}"; do
   voi= # either ACC of Lt Hippocampus
   type= # either ECC, noECC, or ref
   ext= # file extension- either .nii.gz or .json
   newfilename="sub-${ID}_ses-${ses}_voi-${voi}_acq-svs_${type}_mrs.${ext}"
   # rest of code to rename each file
   mv $bidsdir$file $bidsdir$newfilename
done

I'm used to using regex with python, and I'm pretty good at it, but I don't have time to spend another hour on figuring it out in bash. Here's how far I've gotten with the regex itself: (?P<voi>ACC|Lt\ Hippocampus)_svs_(?P<type>ECC|noECC|ref)


Solution

  • Assumptions:

    • file name format will always have 2 underscores and one or two periods
    • file name format will always have the format: <brain-region> + _ + <some-string-to-ignore> + _ + <type> + { .nii.gz or .json }

    Adding a .json entry to OP's list of file names:

    $ ls -1 /tmp/testd
    ACC_svs_ECC.nii.gz
    ACC_svs_noECC.nii.gz
    ACC_svs_ref.nii.gz
    'Lt Hippocampus_svs_ECC.nii.gz'
    'Lt Hippocampus_svs_noECC.nii.gz'
    'Lt Hippocampus_svs_ref.json'
    'Lt Hippocampus_svs_ref.nii.gz'
    

    In this particular case I'd skip the hassles/complexities of a regex and use a combination of parameter substitution (to strip off the path) and the bash / read builtin (in conjunction with dual delimiters _ and .) to parse the file names into the desired variables:

    ID='myid'
    ses='myses'
    bidsdir='/tmp/testd'
    
    for path_file in "${bidsdir}"/*{.nii.gz,.json}
    do
        oldfilename="${path_file##*/}"                         # strip off the path (via parameter substitution)
        IFS='_.' read -r voi x type ext <<< "${oldfilename}"   # parse old file name into variables based on dual delimimters "_" and "."
    
        newfilename="sub-${ID}_ses-${ses}_voi-${voi}_acq-svs_${type}_mrs.${ext}"
    
        echo "path/file = ${path_file}"
        echo "old file  = ${oldfilename}"
        echo "new file  = ${newfilename}"
        echo ""
    done
    

    This generates:

    path/file = /tmp/testd/ACC_svs_ECC.nii.gz
    old file  = ACC_svs_ECC.nii.gz
    new file  = sub-myid_ses-myses_voi-ACC_acq-svs_ECC_mrs.nii.gz
    
    path/file = /tmp/testd/ACC_svs_noECC.nii.gz
    old file  = ACC_svs_noECC.nii.gz
    new file  = sub-myid_ses-myses_voi-ACC_acq-svs_noECC_mrs.nii.gz
    
    path/file = /tmp/testd/ACC_svs_ref.nii.gz
    old file  = ACC_svs_ref.nii.gz
    new file  = sub-myid_ses-myses_voi-ACC_acq-svs_ref_mrs.nii.gz
    
    path/file = /tmp/testd/Lt Hippocampus_svs_ECC.nii.gz
    old file  = Lt Hippocampus_svs_ECC.nii.gz
    new file  = sub-myid_ses-myses_voi-Lt Hippocampus_acq-svs_ECC_mrs.nii.gz
    
    path/file = /tmp/testd/Lt Hippocampus_svs_noECC.nii.gz
    old file  = Lt Hippocampus_svs_noECC.nii.gz
    new file  = sub-myid_ses-myses_voi-Lt Hippocampus_acq-svs_noECC_mrs.nii.gz
    
    path/file = /tmp/testd/Lt Hippocampus_svs_ref.nii.gz
    old file  = Lt Hippocampus_svs_ref.nii.gz
    new file  = sub-myid_ses-myses_voi-Lt Hippocampus_acq-svs_ref_mrs.nii.gz
    
    path/file = /tmp/testd/Lt Hippocampus_svs_ref.json
    old file  = Lt Hippocampus_svs_ref.json
    new file  = sub-myid_ses-myses_voi-Lt Hippocampus_acq-svs_ref_mrs.json
    
    
    oldfilename = Lt Hippocampus_svs_ref.json
    newfilename = sub-myid_ses-myses_voi-Lt Hippocampus_acq-svs_ref_mrs.json