We have hundreds of Glue Jobs that move data from S3 and RDS to Redshift. When a new data is generated from the source systems and then moved to Redshift, we need to perform an upsert i.e insert new records and update existing records to ensure we don't have duplicates in the Redshift tables.
The issue is that when an upsert is done the Glue Job (using the Visual Studio), a temp table is created. The issue with this temp table is that for the columns with VARCHAR datatype, the default limit is 255. Our data often has columns with VARCHAR characters having length greater than 255 (e.g. description fields). Thus the job fails often with the error: String length exceeds DDL length.
We tried various transformation available in Glue visual studio to insert the varchar data more that 255 length.
Our expectation is to insert the description fields having more that 255 characters into the target Redshift table column. Any idea on how to achieve this?
I replicated the Setup on my machine to perform Upserts from S3 to Redshift in the test table.
Table structure:
CREATE TABLE public.test (
id integer ENCODE az64,
name character varying(256) ENCODE lzo,
salary bigint ENCODE az64,
jd character varying(300) ENCODE lzo
) DISTSTYLE AUTO;
Looks like a bug in the Glue Visual Studio, when I inserted more than 255 characters in the 'jd' column faced a similar error message.
Root Cause:- The issue with this temp table created in Preactions using Glue Visual Studio, it creates a default Varchar data type for the string columns with the default limit 255. Also, we do not have control over the temp table using Visual ETL approach.
Solution:- The current working solution will need to convert the Visual Job to Script, since we do not have a option to change the datatype size of temp table in the Visual mode.
1. Go to Job -> Navigate to Script -> Edit script
2. Update the Preaction statement's temp table create statement to change the Varchar size based on a number of characters you are expecting in Description columns as below:
"preactions": DROP TABLE IF EXISTS public.test_temp_df5c11; CREATE TABLE public.test_temp_df5c11 (id int, name VARCHAR, salary VARCHAR, jd VARCHAR(300));",
},
I was able to insert the data after manually editing the preactions statement.