I want to add a rate limiter for calls from my python script glue job to DDB, and mitigate its call volume spikes. I implemented something like the following, like what is suggested in https://pypi.org/project/ratelimiter/ :
from ratelimiter import RateLimiter
rate_limiter = RateLimiter(max_calls=10, period=1)
for i in range(100):
with rate_limiter:
do_something()
but got the following exception:
rmation.doAs(UserGroupInformation.java:1844) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684) 21/02/10 03:05:08 INFO ApplicationMaster: Deleting staging directory hdfs://20.0.18.119:8020/user/root/.sparkStaging/application_1612925905975_0002 21/02/10 03:05:08 INFO ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 10 03:05:10 +0000 2021 LogLength:253 Log Contents: Parse yarn logs get error message: ModuleNotFoundError: No module named 'ratelimiter' Traceback (most recent call last): File "script_2021-02-10-03-04-33.py", line 10, in from ratelimiter import RateLimiter ModuleNotFoundError: No module named 'ratelimiter' End of LogType:stdout
How can I import ratelimiter?
Thanks!
Based on the comments.
ratelimiter
is not a standard python libraries. Thus, by default
it is not available in a Glue job. However, we can
add external libraries to the job as explained in:
The process of adding the external libraries involves three steps:
Creating a .zip file (unless the library is contained in a single .py file) with the library.
Upload the zip to S3.
Use the library in a job or job run.