java python amazon-web-services aws-glue rate-limiting

How to import RateLimiter in AWS Glue Python

I want to add a rate limiter for calls from my python script glue job to DDB, and mitigate its call volume spikes. I implemented something like the following, like what is suggested in https://pypi.org/project/ratelimiter/ :

from ratelimiter import RateLimiter

rate_limiter = RateLimiter(max_calls=10, period=1)

for i in range(100):
    with rate_limiter:
        do_something()

but got the following exception:

rmation.doAs(UserGroupInformation.java:1844) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684) 21/02/10 03:05:08 INFO ApplicationMaster: Deleting staging directory hdfs://20.0.18.119:8020/user/root/.sparkStaging/application_1612925905975_0002 21/02/10 03:05:08 INFO ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 10 03:05:10 +0000 2021 LogLength:253 Log Contents: Parse yarn logs get error message: ModuleNotFoundError: No module named 'ratelimiter' Traceback (most recent call last): File "script_2021-02-10-03-04-33.py", line 10, in from ratelimiter import RateLimiter ModuleNotFoundError: No module named 'ratelimiter' End of LogType:stdout

How can I import ratelimiter?

Thanks!

Solution

Based on the comments.

ratelimiter is not a standard python libraries. Thus, by default it is not available in a Glue job. However, we can add external libraries to the job as explained in:

The process of adding the external libraries involves three steps:

Creating a .zip file (unless the library is contained in a single .py file) with the library.
Upload the zip to S3.
Use the library in a job or job run.