Search code examples
pythonpostgresqlamazon-web-servicesamazon-s3aws-glue

AWS Python Glue Job Not Importing Numeric Columns into RDS


I have a glue job that takes a csv file from an s3 bucket and imports the data into a postgres rds table. It connects to the db with a jdbc connection. The string/varchar columns are being imported, but the numeric columns are not.

Here is the postgres rds column types:

Table Column Data Types

And here is the python glue script:

    def __step_mapping_columns(self):

        # Script generated for node S3 bucket
        dynamicFrame_dept_summary = self.glueContext.create_dynamic_frame.from_options(
            format_options={"quoteChar": '"', "withHeader": True, "separator": ","},
            connection_type="s3",
            format="csv",
            connection_options={
                "paths": [
                    ""
                ],
                "recurse": True,
            },
            transformation_ctx="dynamicFrame_dept_summary",
        )

        # Script generated for node ApplyMapping
        applyMapping_dept_summary = ApplyMapping.apply(
            frame=dynamicFrame_dept_summary,
            mappings=[("PROCESS_MAIN", "string", "process_main", "string"), 
                       ("PROCESS_CORE", "string", "process_core", "string"), 
                       ("DC", "string", "dc", "string"),
                       ("BAG_SIZE", "string", "bag_size", "string"), 
                       ("EVENT_30_LOC", "string", "start_time_utc", "string"),
                       ("VOLUME", "long", "box_volume", "long"),
                       ("MINUTES", "long", "minutes", "long"),
                       ("PLAN_MINUTES", "long", "plan_minutes", "long"), 
                       ("PLAN_RATE", "long", "plan_rate", "long")],
            transformation_ctx="applyMapping_dept_summary",
        )
        logger.info(mappings)

        return applyMapping_dept_summary



Does anyone know what the issue might be?


Solution

  • Figured it out. I needed to typecast those columns to the long type first because the Dynamic frame is unsure about the data type.

    dynamicFrame_dept_summary = dynamicFrame_dept_summary.resolveChoice( specs =[('VOLUME','cast:long')]).resolveChoice( specs = [('MINUTES','cast:long')]).resolveChoice( specs = [('PLAN_MINUTES','cast:long')]).resolveChoice( specs = [('PLAN_RATE','cast:long')])