Search code examples
google-cloud-storage

Performance of gsutil cp command has declined


We have observed that the gsutil cp command for copying a single file to google storage was better when few such processes where running to copy different single files to different location on google storage. The normal speed at that time was ~50mbps. But as "gsutil cp" processes to copy a single file to google storage have increased, the average speed these days has dropped to ~10mbps.

I suppose "gsutil -m cp" command will not improve performace as there is only 1 file to be copied.

What can be attributed to this low speed with increase in number of gsutil cp processes to copy many single files. What can we do increase the speed of these processes


Solution

  • gsutil can upload a single large file in parallel. It does so by uploading parts of the file as separate objects in GCS and then asking GCS to compose them together afterwards and then deleting the individual sub-objects.

    N.B. Because this involves uploading objects and then almost immediately deleting them, you shouldn't do this on Nearline buckets, since there's an extra charge for deleting objects that have been recently uploaded.

    You can set a file size above which gsutil will use this behavior. Try this:

    gsutil -o GSUtil:parallel_composite_upload_threshold=100M cp bigfile gs://your-bucket
    

    More documentation on the feature is available here: https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads