Search code examples
gcloudgsutilgoogle-cloud-sdk

Is `gcloud storage` billing us for transfers of public data?


We're switching over our scripts from using gsutil to the reportedly faster gcloud storage. However we access a significant amount of public data, for example from gs://gcp-public-data--broad-references.

We do NOT want to pay to download this public data. However it appears that gcloud storage is automatically setting the X-Goog-User-Project header for public transfers while gsutil does not.

Is my understanding of the various documentation correct that glcoud storage is instructing GCS to bill us and not the public bucket for transfers?

  1. Run gcloud version
    • On my machine this outputs Google Cloud SDK 407.0.0 and gsutil 5.15
  2. Run gcloud init
    • Log in
    • Select a google project
  3. Run gcloud config list
    • Verify the project you selected before has been configured
  4. Run gsutil -d ls gs://gcp-public-data--broad-references
    • Verify that the request Headers: do NOT contain X-Goog-User-Project
  5. Run gcloud --log-http storage ls gs://gcp-public-data--broad-references
    • Verify that under == headers start == your default project has been included as the X-Goog-User-Project

According to all the documentation I've been able to find one should not set that header by default.

Via https://cloud.google.com/storage/docs/requester-pays:

Important: Buckets that have Requester Pays disabled still accept requests that include a billing project, and charges are applied to the billing project supplied in the request. Consider any billing implications prior to including a billing project in all of your requests.

Via https://cloud.google.com/storage/docs/xml-api/reference-headers#xgooguserproject:

The project specified in the header is billed for charges associated with the request. This header is used, for example, when making requests to buckets that have Requester Pays enabled.


Bonus:

  1. Run gsutil ls gs://gnomad-public-requester-pays
    • You should receive an error BadRequestException: 400 Bucket is a requester pays bucket but no user project provided.
  2. Run gcloud storage ls gs://gnomad-public-requester-pays
    • The bucket contents should be listed

The latter above doesn't seem correct to me as I never intentionally told gcloud storage which project to bill for the request.


Solution

  • Update: This behavior seems to have been fixed as of the Google Cloud SDK 411.0.0 released 2022-12-06. As of that version running the setup specified in the original question no longer sends the X-Goog-User-Project header.

    Thanks @carbocation for the heads up about the fix!


    Heard back from a support member after this was reposted to the Google Cloud Community Forums.

    ErnestoC said:

    The default behavior of the Cloud CLI gcloud is to use the current project for all quota and billing operations. This is why you automatically see your project ID passed in X-Goog-User-Project. This behavior can be overridden though by adding the global --billing-project flag to any command.

    If you set this flag to an empty string, no project is passed in the request. I tested this with gcloud storage and confirmed that requester pays buckets return the expected error message (“400: Bucket is a requester pays bucket but no user project provided.”). Non-requester pays buckets allow operations as well.