Search code examples
pythonpandasplotlygoogle-earth-engine

Earth Engine / Python / EEException: User memory limit exceeded


I am trying to load and plot the daily windspeed at a specified location from the GFS0P25 dataset. I get the following error "EEException: User memory limit exceeded." at the line "wind.select('u_component_of_wind_10m_above_ground').filterDate(i_date,f_date)".

I am aware of the memory limit of earth engine. How can I improve the query so that I can load the daily average wind speed and overcome the memory limit?

The problem is there are many-many rows of data for each location at each time - I am already doing a daily-mean calculation later on in the code, but it doesn't address the memory problem.

Thanks for your help!

Note: i've hidden the service account and credentials - please use your own login, thanks!

import ee
import pandas as pd

#service_account = 'xxx'
#credentials = ee.ServiceAccountCredentials(service_account, 'C:/Data/ee-xxxx.json')
#ee.Initialize(credentials)

# # Trigger the authentication flow.
ee.Authenticate()

# # Initialize the library.
ee.Initialize()

wind = ee.ImageCollection('NOAA/GFS0P25')
                                                
i_date = '2022-01-01'
f_date = '2022-07-01'
wind=wind.select('u_component_of_wind_10m_above_ground').filterDate(i_date,f_date) ####TRACEBACK HERE

u_lon = 21.450520
u_lat = 63.941972
u_poi = ee.Geometry.Point(u_lon, u_lat)
scale = 1000  # scale in meters
wind_u = wind.getRegion(u_poi, scale).getInfo()
wind_u[:5]

df = pd.DataFrame(wind_u)
headers = df.iloc[0]
df = pd.DataFrame(df.values[1:], columns=headers)
df['u_component_of_wind_10m_above_ground'] = pd.to_numeric(df['u_component_of_wind_10m_above_ground'], errors='coerce')

df['id'] = df['id'].str.slice(0,8)
df['id'] = pd.to_datetime(df['id'], format='%Y%m%d')
# Keep the columns of interest.
df = df[['id','u_component_of_wind_10m_above_ground']]
df=df.groupby('id').mean().reset_index()

import plotly.express as px
import webbrowser
fig = px.scatter(df, x="id", y="u_component_of_wind_10m_above_ground")
fig.show()
fig.write_html("c:/data/windchart.html")
webbrowser.open("c:/data/windchart.html")

Solution

  • According to the NOAA/GFS0P25 dataset description, 384 predictions are given every 6 hours.

    Considering your script, it means that you are asking a getInfo() on a series including around 6(months)*30(days)*6(hours)*384(entries) = 414 720 values which is above the limit.

    In your case, it looks like you want the daily average of wind speed. Hence, I would do as follow:

    • reduce a little your period of interest (let's say that you can work on 3 months periods):
    i_date = '2022-01-01'
    f_date = '2022-04-01'
    
    • keep only the first projection for each time step using an appropriate filtering on Ids,
    wind = wind.filterMetadata('system:index', 'contains', 'F000')
    
    • resample the time resolution of your dataset making the daily average. You can see an example on how to do that here. In your case, it gives:
    def resampler(coll, freq, unit, scale_factor, band_name):
        """
        This function aims to resample the time scale of an ee.ImageCollection.
        The function returns an ee.ImageCollection with the mean value of the
        band on the selected frequency.
    
        coll: (ee.ImageCollection) only one band can be handled
        freq: (int) corresponds to the resampling frequence
        unit: (str) corresponds to the resampling time unit.
                    must be 'day', 'month' or 'year'
        scale_factor (float): scaling factor used to get our value in the good unit
        band_name (str) name of the output band
        """
    
        # Define initial and final dates of the collection.
        firstdate = ee.Date(
            coll.sort("system:time_start", True).first().get("system:time_start")
        )
    
        lastdate = ee.Date(
            coll.sort("system:time_start", False).first().get("system:time_start")
        )
    
        # Calculate the time difference between both dates.
        # https://developers.google.com/earth-engine/apidocs/ee-date-difference
        diff_dates = lastdate.difference(firstdate, unit)
    
        # Define a new time index (for output).
        new_index = ee.List.sequence(0, ee.Number(diff_dates), freq)
    
        # Define the function that will be applied to our new time index.
        def apply_resampling(date_index):
            # Define the starting date to take into account.
            startdate = firstdate.advance(ee.Number(date_index), unit)
    
            # Define the ending date to take into account according
            # to the desired frequency.
            enddate = firstdate.advance(ee.Number(date_index).add(freq), unit)
    
            # Calculate the composite image.
            image = (
                coll.filterDate(startdate, enddate)
                .mean()
                .multiply(scale_factor)
                .rename(band_name)
            )
    
            # Return the final image with the appropriate time index.
            return image.set("system:time_start", startdate.millis())
    
        # Map the function to the new time index.
        res = new_index.map(apply_resampling)
    
        # Transform the result into an ee.ImageCollection.
        res = ee.ImageCollection(res)
    
        return res
    

    then, apply the function as follow:

    wind_d = resampler(wind, 1, "day", 1, "u_component_of_wind_10m_above_ground")
    

    Then you'll be able to do your wind_d.getRegion(u_poi, scale).getInfo().

    I hope it will help.