Search code examples
google-cloud-platformgoogle-cloud-storagegcloud

Syncing files between projects/buckets in Google Cloud Storage


I am trying to synchronize files between two projects and two buckets on Google Cloud.

However, I would like to only copy files that are not in A but not in B (destination). It is fine to overwrite files that are both in A and B (preferred).

When I do the following:

  • in my bucket, I create a folder test and add the folder A with inside file-1
  • I run the following command: gsutil cp -r gs://from-project.appspot.com/test gs://to-project.appspot.com/test2

This works fine, and I have the folder A within the folder test2 in my to-project bucket.

Then the problem occurs:

  • I add a folder B and within folder A I delete file-1 and add file-2 (to test the notion of a file is in A but not in B).
  • When I run the same command however, I do not get that only file-2 gets copied and I have an additional folder B, but instead I get a new folder within test2 named test where inside I find A and B but without file-1 in a (basically a replica of the new situation).

enter image description here

Why does this happen and how can I prevent this to enable the syncing?


Solution

  • gsutil rsync command is preferred to synchronize content of two buckets.

    You can use the -d option to delete files under your destination bucket that have been not found under the source bucket. Be careful though, because it can delete files in the destination bucket.