Due to limited access and tooling, I am stuck with using the redshift-data api.
Specifically, I am using boto3 to work with a redshift table. I want update columns on specific rows. But I have ran into api limitations with the boto3 ExecuteStatement operation. I can't send a ExecuteStatement with an update query over 100kB. I get the following exception:
Failed to query redshift for list of processed buckets, An error
occurred (ValidationException) when calling the ExecuteStatement
operation: Cannot process query string larger than 100kB
Am I using the best api call for the task at hand? Or may I need to adjust the updates to be done in batches?
From AWS Support
"ValidationException: An error occurred (ValidationException) when calling the ExecuteStatement operation: Cannot process query string larger than 100kB"
As you correctly mention, the reason you are receiving this error is that while using Amazon Redshift Data API there is a limit of maximum query statement which is 100KB [1]. This is by design a hard limit for the Query size in RedShift Data API which cannot be increased.
The recommendation to overcome these limitations is to reframe your query so that it obeys the limitations of Redshift Data API. One useful approach could be the use of stored procedures and inserting the query inside the procedure to execute or as you already mention, adjust the updates to be done in batches.
However, if this is not a possible solution for you, another workaround could be the use JDBC/ODBC client connection to submit your query. Kindly note that even with this option, we have to keep in mind that the maximum size for a single Amazon Redshift SQL statement is 16 MB [2].
[1] Data api considerations https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html#data-api-calling-considerations
[2] The maximum size for a single Amazon Redshift SQL statement https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-sql.html