Search code examples
scalaamazon-web-servicesapache-sparketlaws-glue

How to create dynamic data frame from S3 files in Glue Job in Scala?


I'm having problems in converting a Python Glue Job to Scala Glue Job, namely create_dynamic_data_frame_options method. In python the syntax is:

dyf = glueContext.create_dynamic_frame_from_options("s3",
                                        {'paths': file_paths},
                                         format="csv",
                                       format_options={"separator": ",", "quoteChar": '"'})

where file_paths is a list ['s3://bucket1/file1.txt','s3://bucket2/file2.txt'] . How to do the same thing in Scala?


Solution

  • Try this:

    val file_paths = Array(
        "s3://bucket/data1",
        "s3://bucket/data2"
    )
    
    val dyf = glueContext.getSourceWithFormat(
        connectionType = "s3", 
        options = JsonOptions(Map("paths" -> file_paths)), 
        format = "csv", 
        formatOptions = JsonOptions(Map("separator" -> ",", "quoteChar": "\""))
    ).getDynamicFrame()