Search code examples
regexbashshellunixmv

Copy files into folders based on regex matchs of file and folder names


I have a situation where I have a series of files as follows:

1234_A_data1_v1.ext
1234_A_data1_v2.ext
1234_A_data2_v1.ext
1234_A_data2_v2.ext
1234_B_data1_v1.ext
1234_B_data1_v2.ext
1234_B_data2_v1.ext
1234_B_data2_v2.ext
1234_AA_data1_v1.ext
1234_AA_data1_v2.ext
1234_AA_data2_v1.ext
1234_AA_data2_v2.ext
1234_BB_data1_v1.ext
1234_BB_data1_v2.ext
1234_BB_data2_v1.ext
1234_BB_data2_v2.ext

The regex string 1234_[A-Z]+ identifies the dataset. I want to create folders for each such dataset (based off of the filenames), and then move the corresponding files into said folders. For instance, 1234_A_data1_v1.ext, 1234_A_data1_v2.ext, 1234_A_data2_v1.ext, 1234_A_data2_v2.ext would be placed under the folder 1234_A.

I managed to create the folders as follows:

grep -o -E '^[0-9]+_[A-Z]+' seqnames | xargs echo | xargs mkdir

Which gave me:

1234_A
1234_A_data1_v1.ext
1234_A_data1_v2.ext
1234_A_data2_v1.ext
1234_A_data2_v2.ext
1234_B
1234_B_data1_v1.ext
1234_B_data1_v2.ext
1234_B_data2_v1.ext
1234_B_data2_v2.ext
1234_AA
1234_AA_data1_v1.ext
1234_AA_data1_v2.ext
1234_AA_data2_v1.ext
1234_AA_data2_v2.ext
1234_BB
1234_BB_data1_v1.ext
1234_BB_data1_v2.ext
1234_BB_data2_v1.ext
1234_BB_data2_v2.ext

Which is all well and good. But now, I don't know how to move the files into their respective folders, and I'm quite lost.

Any pointers on how I could accomplish this would be appreciated.

In particular, is there some way to do something like mv *<pattern>*filename *<pattern>*destination? I'm also interested in learning if there are other succinct (maybe proper?) ways to achieve this task.


Solution

  • Well, if all of these files follow the pattern you show and are in the same directory, this one-liner seems to work.

    $ for d in $( cut -f1-2 -d_ <(ls 1234_*) | sort -u ); do mkdir $d; mv ${d}_* $d; done
    

    This bash command uses the Looping Construct for, the Pipeline |, Process Substitution <(...), and Command Substitution $(...).

    ls 1234_* creates a list of all files that match that pattern. cut -f1-2 -d_ splits each matching filename on _ and then outputs only the first two fields (including the delimiter _ between those two fields). sort -u first sorts these cut prefixes and then outputs only the unique items. It's these unique prefixes that you want to use for your directory names. for then loops over these unique prefixes creating the directories (mkdir) and mving the prefix-matching files to that new directory.

    Use with caution and adjust as necessary. If there are other files or directories in this directory, or if there is some error while executing the command, executing or re-executing the command probably won't do what you want since there will be directories created, the glob won't match what you want, etc.

    Here's an example.

    $ ls -alF   # Show the files in the directory
    total 8
    drwxrwxr-x.  2 user user 4096 Jul 19 02:15 ./
    drwxrwxr-x. 34 user user 4096 Jul 19 02:02 ../
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_AA_data1_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_AA_data1_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_AA_data2_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_AA_data2_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_A_data1_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_A_data1_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_A_data2_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_A_data2_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_BB_data1_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_BB_data1_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_BB_data2_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_BB_data2_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_B_data1_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_B_data1_v2.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_B_data2_v1.ext
    -rw-rw-r--.  1 user user    0 Jul 19 02:07 1234_B_data2_v2.ext
    $ for d in $( cut -f1-2 -d_ <(ls 1234_*) | sort -u ); do mkdir $d; mv ${d}_* $d; done  # the one-liner
    $ ls -alF  # show the directory now
    total 24
    drwxrwxr-x.  6 user user 4096 Jul 19 02:17 ./
    drwxrwxr-x. 34 user user 4096 Jul 19 02:02 ../
    drwxrwxr-x.  2 user user 4096 Jul 19 02:17 1234_A/
    drwxrwxr-x.  2 user user 4096 Jul 19 02:17 1234_AA/
    drwxrwxr-x.  2 user user 4096 Jul 19 02:17 1234_B/
    drwxrwxr-x.  2 user user 4096 Jul 19 02:17 1234_BB/
    $ tree .  # show the whole directory tree structure
    .
    ├── 1234_A
    │   ├── 1234_A_data1_v1.ext
    │   ├── 1234_A_data1_v2.ext
    │   ├── 1234_A_data2_v1.ext
    │   └── 1234_A_data2_v2.ext
    ├── 1234_AA
    │   ├── 1234_AA_data1_v1.ext
    │   ├── 1234_AA_data1_v2.ext
    │   ├── 1234_AA_data2_v1.ext
    │   └── 1234_AA_data2_v2.ext
    ├── 1234_B
    │   ├── 1234_B_data1_v1.ext
    │   ├── 1234_B_data1_v2.ext
    │   ├── 1234_B_data2_v1.ext
    │   └── 1234_B_data2_v2.ext
    └── 1234_BB
        ├── 1234_BB_data1_v1.ext
        ├── 1234_BB_data1_v2.ext
        ├── 1234_BB_data2_v1.ext
        └── 1234_BB_data2_v2.ext
    
    4 directories, 16 files