python pandas plotly google-earth-engine

Earth Engine / Python / EEException: User memory limit exceeded

I am trying to load and plot the daily windspeed at a specified location from the GFS0P25 dataset. I get the following error "EEException: User memory limit exceeded." at the line "wind.select('u_component_of_wind_10m_above_ground').filterDate(i_date,f_date)".

I am aware of the memory limit of earth engine. How can I improve the query so that I can load the daily average wind speed and overcome the memory limit?

The problem is there are many-many rows of data for each location at each time - I am already doing a daily-mean calculation later on in the code, but it doesn't address the memory problem.

Thanks for your help!

Note: i've hidden the service account and credentials - please use your own login, thanks!

import ee
import pandas as pd

#service_account = 'xxx'
#credentials = ee.ServiceAccountCredentials(service_account, 'C:/Data/ee-xxxx.json')
#ee.Initialize(credentials)

# # Trigger the authentication flow.
ee.Authenticate()

# # Initialize the library.
ee.Initialize()

wind = ee.ImageCollection('NOAA/GFS0P25')
                                                
i_date = '2022-01-01'
f_date = '2022-07-01'
wind=wind.select('u_component_of_wind_10m_above_ground').filterDate(i_date,f_date) ####TRACEBACK HERE

u_lon = 21.450520
u_lat = 63.941972
u_poi = ee.Geometry.Point(u_lon, u_lat)
scale = 1000  # scale in meters
wind_u = wind.getRegion(u_poi, scale).getInfo()
wind_u[:5]

df = pd.DataFrame(wind_u)
headers = df.iloc[0]
df = pd.DataFrame(df.values[1:], columns=headers)
df['u_component_of_wind_10m_above_ground'] = pd.to_numeric(df['u_component_of_wind_10m_above_ground'], errors='coerce')

df['id'] = df['id'].str.slice(0,8)
df['id'] = pd.to_datetime(df['id'], format='%Y%m%d')
# Keep the columns of interest.
df = df[['id','u_component_of_wind_10m_above_ground']]
df=df.groupby('id').mean().reset_index()

import plotly.express as px
import webbrowser
fig = px.scatter(df, x="id", y="u_component_of_wind_10m_above_ground")
fig.show()
fig.write_html("c:/data/windchart.html")
webbrowser.open("c:/data/windchart.html")

Solution

According to the NOAA/GFS0P25 dataset description, 384 predictions are given every 6 hours.

Considering your script, it means that you are asking a getInfo() on a series including around 6(months)*30(days)*6(hours)*384(entries) = 414 720 values which is above the limit.

In your case, it looks like you want the daily average of wind speed. Hence, I would do as follow:

reduce a little your period of interest (let's say that you can work on 3 months periods):

i_date = '2022-01-01'
f_date = '2022-04-01'

keep only the first projection for each time step using an appropriate filtering on Ids,

wind = wind.filterMetadata('system:index', 'contains', 'F000')

resample the time resolution of your dataset making the daily average. You can see an example on how to do that here. In your case, it gives:

def resampler(coll, freq, unit, scale_factor, band_name):
    """
    This function aims to resample the time scale of an ee.ImageCollection.
    The function returns an ee.ImageCollection with the mean value of the
    band on the selected frequency.

    coll: (ee.ImageCollection) only one band can be handled
    freq: (int) corresponds to the resampling frequence
    unit: (str) corresponds to the resampling time unit.
                must be 'day', 'month' or 'year'
    scale_factor (float): scaling factor used to get our value in the good unit
    band_name (str) name of the output band
    """

    # Define initial and final dates of the collection.
    firstdate = ee.Date(
        coll.sort("system:time_start", True).first().get("system:time_start")
    )

    lastdate = ee.Date(
        coll.sort("system:time_start", False).first().get("system:time_start")
    )

    # Calculate the time difference between both dates.
    # https://developers.google.com/earth-engine/apidocs/ee-date-difference
    diff_dates = lastdate.difference(firstdate, unit)

    # Define a new time index (for output).
    new_index = ee.List.sequence(0, ee.Number(diff_dates), freq)

    # Define the function that will be applied to our new time index.
    def apply_resampling(date_index):
        # Define the starting date to take into account.
        startdate = firstdate.advance(ee.Number(date_index), unit)

        # Define the ending date to take into account according
        # to the desired frequency.
        enddate = firstdate.advance(ee.Number(date_index).add(freq), unit)

        # Calculate the composite image.
        image = (
            coll.filterDate(startdate, enddate)
            .mean()
            .multiply(scale_factor)
            .rename(band_name)
        )

        # Return the final image with the appropriate time index.
        return image.set("system:time_start", startdate.millis())

    # Map the function to the new time index.
    res = new_index.map(apply_resampling)

    # Transform the result into an ee.ImageCollection.
    res = ee.ImageCollection(res)

    return res

then, apply the function as follow:

wind_d = resampler(wind, 1, "day", 1, "u_component_of_wind_10m_above_ground")

Then you'll be able to do your wind_d.getRegion(u_poi, scale).getInfo().

I hope it will help.