Search code examples
google-cloud-platformcloudgsutilgoogle-cloud-storage

Is there a way to perform resumable uploads using gsutil cp (not REST APIs) across different machines?


When gsutil cp fails, we can resume the session by running the exact same command on the same machine (session is tracked via ~/.gsutil/tracker-files). However, when we switch machines, this directory no longer exists, so when running the same command, the session is not resumed. See Resumable Uploads (CLI) and Background On Resumable Transfers.

Alternatively, I see that with Resumable Uploads (REST APIs), we can generate a session URI and pass this URI to different commands. However, I don't see the option to pass in a session URI in the gsutil cp options listed here.

With this information, is the only way to resume the session via gsutil CLI by sharing the ~/.gsutil/tracker-files volume among different machines?


Solution

  • Both gsutil and gcloud alpha storage use local tracker files to handle resuming uploads, so yes, you'd need to copy those files to a new machine if you wanted to resume the operation elsewhere.

    As you noted, the API itself provides a "session URI" that you can use to query the upload progress and resume the upload from anywhere, but I don't believe specifying it explicitly is an option in either command line utility.

    Our client libraries do support it, though. If recovering uploads from different machines is a regular part of your workflow, you could perhaps write a small, custom uploader program. Here's an example of how to use resumable uploads with the C++ client. The key section is:

        gcs::ObjectWriteStream stream =
        client.WriteObject(bucket_name, object_name,
                           gcs::RestoreResumableUploadSession(session_id));
        std::cout << "I should start writing from byte " << stream.next_expected_byte();