Search code examples
amazon-s3gdalrasterio

How to read s3 rasters with accompanying ".aux.xml" metadata file using rasterio?


Suppose a GeoTIFF raster on a S3 bucket which has - next to the raw TIF file - an associated .aux.xml metadata file:

s3://my_s3_bucket/myraster.tif
s3://my_s3_bucket/myraster.tif.aux.xml

I'm trying to load this raster directly from the bucket using rasterio:

fn = 's3://my_s3_bucket/myraster.tif'
with rasterio.Env(session, **rio_gdal_options):
     with rasterio.open(fn) as src:
          src_nodata = src.nodata
          scales = src.scales
          offsets = src.offsets
          bands = src.tags()['bands']

And this seems to be a problem. The raster file itself is successfully opened, but because rasterio did not automatically load the associated .aux.xml, the metadata was never loaded. Therefore, no band tags, no proper scales and offsets.

I should add that doing exactly the same on a local file does work perfectly. The .aux.xml automatically gets picked up and all relevant metadata is correctly loaded.

Is there a way to make this work on s3 as well? And if not, could there be a workaround for this problem? Obviously, metadata was too large to get coded into the TIF file. Rasterio (GDAL under the hood) generated the .aux.xml automatically when creating the raster.


Solution

  • Finally got it to work. It appears to be essential that in the GDAL options passed to the rasterio.Env module, .xml is added as an allowed extension to CPL_VSIL_CURL_ALLOWED_EXTENSIONS:

    The documentation of this option states:

    Consider that only the files whose extension ends up with one that is listed in CPL_VSIL_CURL_ALLOWED_EXTENSIONS exist on the server.

    And while almost all examples to be found online only set .tif as allowed extension because it can dramatically speed up file opening, any .aux.xml files are not seen by rasterio/GDAL.

    So in case we expect there to be associated .aux.xml metadata files with the .tif files, we have to change our example to:

    rio_gdal_options = {
        'AWS_VIRTUAL_HOSTING': False,
        'AWS_REQUEST_PAYER': 'requester',
        'GDAL_DISABLE_READDIR_ON_OPEN': 'FALSE',
        'CPL_VSIL_CURL_ALLOWED_EXTENSIONS': '.tif,.xml',  # Adding .xml is essential!
        'VSI_CACHE': False
    }
    
    with rasterio.Env(session, **rio_gdal_options):
         with rasterio.open(fn) as src:  # The associated .aux.xml file will automatically be found and loaded now
              src_nodata = src.nodata
              scales = src.scales
              offsets = src.offsets
              bands = src.tags()['bands']