Suppose I have multiple files in different sub-directories with names like 20060630 AD8,11 +1015.WAV
and 20050508_Natoa_Enc1_AD5AK_1.WAV
. Now I know that all these files will have a substring like AD
(in the first file) and AD
, AK
(in the second). There are total 16 of these classes (AD
, AK
, AN
etc) that I've made as empty folders in the top level directory.
I want to copy all these files according to the substrings matched into their respective directory. Now using gsutil
, the commands may come like:
gsutil cp gs://bucket/Field/2005/20060630 AD8,11 +1015.WAV gs://bucket/AD/20060630 AD8,11 +1015.WAV
How can this approach work for automating the task for thousands of files in the same bucket?
Is it safe to assume an approach like:
if 'AD' in filename:
gsutil cp gs://bucket/<filename> gs://bucket/AD/<filename>
elif 'AK' in filename:
gsutil cp gs://bucket/<filename> gs://bucket/AK/<filename>
You can write an easy BASH script for this. The code would be pretty simple since gsutil supports wildcards and it can recursively dive into sub-directories to find your files.
#!/bin/bash
bucket_name=my-example-bucket
substring_list=(
AD
AK
AN
)
for substring in "${substring_list[@]}"; do
gsutil cp gs://$bucket_name/**/*$substring* gs://$bucket_name/$substring/
done
I also see that you have some Python experience, so you could alternatively leverage the Python Client for Google Cloud Storage along with a similar wildcard strategy.