Search code examples
google-cloud-platformcopygoogle-cloud-storagebucketgsutil

Copying multiple files inside a Google Cloud bucket to different directories based on file name


Suppose I have multiple files in different sub-directories with names like 20060630 AD8,11 +1015.WAV and 20050508_Natoa_Enc1_AD5AK_1.WAV. Now I know that all these files will have a substring like AD (in the first file) and AD, AK (in the second). There are total 16 of these classes (AD, AK, AN etc) that I've made as empty folders in the top level directory.

I want to copy all these files according to the substrings matched into their respective directory. Now using gsutil, the commands may come like:

gsutil cp gs://bucket/Field/2005/20060630 AD8,11 +1015.WAV gs://bucket/AD/20060630 AD8,11 +1015.WAV

How can this approach work for automating the task for thousands of files in the same bucket?

Is it safe to assume an approach like:

if 'AD' in filename:
    gsutil cp gs://bucket/<filename> gs://bucket/AD/<filename>
elif 'AK' in filename:
    gsutil cp gs://bucket/<filename> gs://bucket/AK/<filename>

Solution

  • You can write an easy BASH script for this. The code would be pretty simple since gsutil supports wildcards and it can recursively dive into sub-directories to find your files.

    #!/bin/bash
    
    bucket_name=my-example-bucket
    substring_list=(
      AD
      AK
      AN
    )
    
    for substring in "${substring_list[@]}"; do
       gsutil cp gs://$bucket_name/**/*$substring* gs://$bucket_name/$substring/
    done
    

    I also see that you have some Python experience, so you could alternatively leverage the Python Client for Google Cloud Storage along with a similar wildcard strategy.