Search code examples
google-earth-engine

Google Earth Engine download problems, is this caused by immutable server side objects?


I have a function that will download an image collection as a TFrecord or a geotiff.

Heres the function -

def download_image_collection_to_drive(collection, aois, bands, limit, export_format):
    if collection.size().lt(ee.Number(limit)):
        bands = [band for band in bands if band not in ['SCL', 'QA60']]
        for aoi in aois:
            cluster = aoi.get('cluster').getInfo()
            geom = aoi.bounds().getInfo()['geometry']['coordinates']
            aoi_collection = collection.filterMetadata('cluster', 'equals', cluster)

            for ts in range(1, 11):
                print(ts)
                ts_collection = aoi_collection.filterMetadata('interval', 'equals', ts)
                if ts_collection.size().eq(ee.Number(1)):
                    image = ts_collection.first()
                    p_id = image.get("PRODUCT_ID").getInfo()
                    description = f'{cluster}_{ts}_{p_id}'
                    task_config = {
                        'fileFormat': export_format,
                        'image': image.select(bands),
                        'region': geom,
                        'description': description,
                        'scale': 10,
                        'folder': 'output'
                    }
                    if export_format == 'TFRecord':
                        task_config['formatOptions'] = {'patchDimensions': [256, 256], 'kernelSize': [3, 3]}
                    task = ee.batch.Export.image.toDrive(**task_config)
                    task.start()
                else:
                    logger.warning(f'no image for interval {ts}')
    else:
        logger.warning(f'collection over {limit} aborting drive download')

It seems whenever it gets to the second aoi it fails, Im confused by this as if ts_collection.size().eq(ee.Number(1)) confirms there is an image there so it should manage to get product id from it.

line 24, in download_image_collection_to_drive
    p_id = image.get("PRODUCT_ID").getInfo()
  File "/lib/python3.7/site-packages/ee/computedobject.py", line 95, in getInfo
    return data.computeValue(self)
  File "/lib/python3.7/site-packages/ee/data.py", line 717, in computeValue
    prettyPrint=False))['result']
  File "/lib/python3.7/site-packages/ee/data.py", line 340, in _execute_cloud_call
    raise _translate_cloud_exception(e)
ee.ee_exception.EEException: Element.get: Parameter 'object' is required.

am I falling foul of immutable server side objects somewhere?


Solution

  • This is a server-side value, problem, yes, but immutability doesn't have to do with it — your if statement isn't working as you intend.

    ts_collection.size().eq(ee.Number(1)) is a server-side value — you've described a comparison that hasn't happened yet. That means that doing any local operation like a Python if statement cannot take the comparison outcome into account, and will just treat it as a true value.

    Using getInfo would be a quick fix:

    if ts_collection.size().eq(ee.Number(1)).getInfo():
    

    but it would be more efficient to avoid using getInfo more than needed by fetching the entire collection's info just once, which includes the image info.

    ...
    ts_collection_info = ts_collection.getInfo()
    if ts_collection['features']:  # Are there any images in the collection?
        image = ts_collection.first()
        image_info = ts_collection['features'][0]  # client-side image info already downloaded
        p_id = image_info['properties']['PRODUCT_ID']  # get ID from client-side info
        ...
    

    This way, you only make two requests per ts: one to check for the match, and one to start the export.

    Note that I haven't actually run this Python code, and there might be some small mistakes; if it gives you any trouble, print(ts_collection_info) and examine the structure you actually received to figure out how to interpret it.