Search code examples
postgresqlgoogle-cloud-storagepostgisgoogle-cloud-sqlpostgis-raster

How can I use Google Cloud SQL PostGIS with out-db rasters in Google Cloud Storage?


I have PostgreSQL instance configured in Cloud SQL with the PostGIS and PostGIS raster extensions installed. I am storing vector data in-db and raster data as COGs in a bucket which are registered in the database as out-db rasters.

The Cloud SQL instance is using PostgreSQL 14.9 with PostGIS 3.2.5 and PostGIS Raster 3.2.3. I have set the required flags to enable out-db rasers:

  • postgis.enable_outdb_rasters=on
  • postgis.gdal_enabled_drivers=ENABLE_ALL

When the instance was set up Google auto-created a Cloud SQL service agent: p<PROJECT_NUMBER>-<IDENTIFIER>@gcp-sa-cloud-sql.iam.gserviceaccount.com. I have granted this agent the Storage Admin role at the project level and verified via the cloud console that the permissions have been set.

Additionally, I have a compute instance within the same project which uses a compute service account. From this VM I can sucessfully call gdalinfo /vsigs/path-to-cog.tif to access the raster data.

However, when executing e.g. SELECT ST_Value(data, 1, 1) from rasterdata limit 1; to grab a pixel value via PostGIS I get the error rt_band_load_offline_data: Cannot open offline raster: /vsigs...

simple-architecture

I have previously used the out-db pattern successfully with a local PostgreSQL instance accessing the remote COGs. In this scenario I explictly set the GOOGLE_APPLICATION_CREDENTIALS enivronment variable.

The error I'm getting appears to be related to permissions of the service agent and/or GDAL struggling to identify authentication details on the Cloud SQL instance.

I have followed some related advice provided here but the issue remains.

Is there something I am missing? Is there a way to set custom environment variables on the Cloud SQL instance to ensure GDAL can correctly authenticate?


Solution

  • This issue was indeed related to auth but not in the way I'd initially thought.

    Even though the CloudSQL instance is on the same VPC as the bucket of COGs, the GDAL virtual file (VSI) driver configuration must be set to properly authorise PostGIS to access the data.

    This requires setting a client id and secret via

    SET postgis.gdal_vsi_options = 'GS_ACCESS_KEY_ID=xxxx GS_SECRET_ACCESS_KEY=yyyy';
    

    Where GS_ACCESS_KEY_ID and GS_SECRET_ACCESS_KEY come from a Google HMAC key.

    However, performing this SET operation on the CloudSQL instance fails with:

    WARNING:  'gs_access_key_id' is not a legal VSI network file option
    ERROR:  invalid value for parameter "postgis.gdal_vsi_options": "GS_ACCESS_KEY_ID=xxxxxxxx"
    

    I have been able to identify this as a bug in the version of GDAL used by PostGIS in CloudSQL. Therefore I have logged an issue with Google and encourage anyone who is also impacted by this to upvote to increase visibilty.