Search code examples
pythonpandasamazon-web-servicesaws-lambdaaws-glue

Use AWS Glue Python with NumPy and Pandas Python Packages


What is the easiest way to use packages such as NumPy and Pandas within the new ETL tool on AWS called Glue? I have a completed script within Python I would like to run in AWS Glue that utilizes NumPy and Pandas.


Solution

  • I think the current answer is you cannot. According to AWS Glue Documentation:

    Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.

    But even when I try to include a normal python written library in S3, the Glue job failed because of some HDFS permission problem. If you find a way to solve this, please let me know as well.