I am trying to load and plot the daily windspeed at a specified location from the GFS0P25 dataset. I get the following error "EEException: User memory limit exceeded." at the line "wind.select('u_component_of_wind_10m_above_ground').filterDate(i_date,f_date)".
I am aware of the memory limit of earth engine. How can I improve the query so that I can load the daily average wind speed and overcome the memory limit?
The problem is there are many-many rows of data for each location at each time - I am already doing a daily-mean calculation later on in the code, but it doesn't address the memory problem.
Thanks for your help!
Note: i've hidden the service account and credentials - please use your own login, thanks!
import ee
import pandas as pd
#service_account = 'xxx'
#credentials = ee.ServiceAccountCredentials(service_account, 'C:/Data/ee-xxxx.json')
#ee.Initialize(credentials)
# # Trigger the authentication flow.
ee.Authenticate()
# # Initialize the library.
ee.Initialize()
wind = ee.ImageCollection('NOAA/GFS0P25')
i_date = '2022-01-01'
f_date = '2022-07-01'
wind=wind.select('u_component_of_wind_10m_above_ground').filterDate(i_date,f_date) ####TRACEBACK HERE
u_lon = 21.450520
u_lat = 63.941972
u_poi = ee.Geometry.Point(u_lon, u_lat)
scale = 1000 # scale in meters
wind_u = wind.getRegion(u_poi, scale).getInfo()
wind_u[:5]
df = pd.DataFrame(wind_u)
headers = df.iloc[0]
df = pd.DataFrame(df.values[1:], columns=headers)
df['u_component_of_wind_10m_above_ground'] = pd.to_numeric(df['u_component_of_wind_10m_above_ground'], errors='coerce')
df['id'] = df['id'].str.slice(0,8)
df['id'] = pd.to_datetime(df['id'], format='%Y%m%d')
# Keep the columns of interest.
df = df[['id','u_component_of_wind_10m_above_ground']]
df=df.groupby('id').mean().reset_index()
import plotly.express as px
import webbrowser
fig = px.scatter(df, x="id", y="u_component_of_wind_10m_above_ground")
fig.show()
fig.write_html("c:/data/windchart.html")
webbrowser.open("c:/data/windchart.html")
According to the NOAA/GFS0P25 dataset description, 384 predictions are given every 6 hours.
Considering your script, it means that you are asking a getInfo()
on a series including around 6(months)*30(days)*6(hours)*384(entries) = 414 720 values which is above the limit.
In your case, it looks like you want the daily average of wind speed. Hence, I would do as follow:
i_date = '2022-01-01'
f_date = '2022-04-01'
wind = wind.filterMetadata('system:index', 'contains', 'F000')
def resampler(coll, freq, unit, scale_factor, band_name):
"""
This function aims to resample the time scale of an ee.ImageCollection.
The function returns an ee.ImageCollection with the mean value of the
band on the selected frequency.
coll: (ee.ImageCollection) only one band can be handled
freq: (int) corresponds to the resampling frequence
unit: (str) corresponds to the resampling time unit.
must be 'day', 'month' or 'year'
scale_factor (float): scaling factor used to get our value in the good unit
band_name (str) name of the output band
"""
# Define initial and final dates of the collection.
firstdate = ee.Date(
coll.sort("system:time_start", True).first().get("system:time_start")
)
lastdate = ee.Date(
coll.sort("system:time_start", False).first().get("system:time_start")
)
# Calculate the time difference between both dates.
# https://developers.google.com/earth-engine/apidocs/ee-date-difference
diff_dates = lastdate.difference(firstdate, unit)
# Define a new time index (for output).
new_index = ee.List.sequence(0, ee.Number(diff_dates), freq)
# Define the function that will be applied to our new time index.
def apply_resampling(date_index):
# Define the starting date to take into account.
startdate = firstdate.advance(ee.Number(date_index), unit)
# Define the ending date to take into account according
# to the desired frequency.
enddate = firstdate.advance(ee.Number(date_index).add(freq), unit)
# Calculate the composite image.
image = (
coll.filterDate(startdate, enddate)
.mean()
.multiply(scale_factor)
.rename(band_name)
)
# Return the final image with the appropriate time index.
return image.set("system:time_start", startdate.millis())
# Map the function to the new time index.
res = new_index.map(apply_resampling)
# Transform the result into an ee.ImageCollection.
res = ee.ImageCollection(res)
return res
then, apply the function as follow:
wind_d = resampler(wind, 1, "day", 1, "u_component_of_wind_10m_above_ground")
Then you'll be able to do your wind_d.getRegion(u_poi, scale).getInfo()
.
I hope it will help.