Suppose a GeoTIFF raster on a S3 bucket which has - next to the raw TIF file - an associated .aux.xml
metadata file:
s3://my_s3_bucket/myraster.tif
s3://my_s3_bucket/myraster.tif.aux.xml
I'm trying to load this raster directly from the bucket using rasterio:
fn = 's3://my_s3_bucket/myraster.tif'
with rasterio.Env(session, **rio_gdal_options):
with rasterio.open(fn) as src:
src_nodata = src.nodata
scales = src.scales
offsets = src.offsets
bands = src.tags()['bands']
And this seems to be a problem. The raster file itself is successfully opened, but because rasterio did not automatically load the associated .aux.xml
, the metadata was never loaded. Therefore, no band tags, no proper scales and offsets.
I should add that doing exactly the same on a local file does work perfectly. The .aux.xml
automatically gets picked up and all relevant metadata is correctly loaded.
Is there a way to make this work on s3 as well? And if not, could there be a workaround for this problem? Obviously, metadata was too large to get coded into the TIF file. Rasterio (GDAL under the hood) generated the .aux.xml
automatically when creating the raster.
Finally got it to work. It appears to be essential that in the GDAL options passed to the rasterio.Env
module, .xml
is added as an allowed extension to CPL_VSIL_CURL_ALLOWED_EXTENSIONS
:
The documentation of this option states:
Consider that only the files whose extension ends up with one that is listed in CPL_VSIL_CURL_ALLOWED_EXTENSIONS exist on the server.
And while almost all examples to be found online only set .tif
as allowed extension because it can dramatically speed up file opening, any .aux.xml
files are not seen by rasterio/GDAL.
So in case we expect there to be associated .aux.xml
metadata files with the .tif
files, we have to change our example to:
rio_gdal_options = {
'AWS_VIRTUAL_HOSTING': False,
'AWS_REQUEST_PAYER': 'requester',
'GDAL_DISABLE_READDIR_ON_OPEN': 'FALSE',
'CPL_VSIL_CURL_ALLOWED_EXTENSIONS': '.tif,.xml', # Adding .xml is essential!
'VSI_CACHE': False
}
with rasterio.Env(session, **rio_gdal_options):
with rasterio.open(fn) as src: # The associated .aux.xml file will automatically be found and loaded now
src_nodata = src.nodata
scales = src.scales
offsets = src.offsets
bands = src.tags()['bands']