Search code examples
amazon-web-servicesamazon-redshiftlistagg

Redshift Result size exceeds LISTAGG limit on svl_statementtext


Trying to reconstruct my query history from svl_statementtext using listagg.
Getting error :

Result size exceeds LISTAGG limit (limit: 65535)

However, I cannot see how or where I have exceeded limit.

My failing query :

SELECT pid,xid, min(starttime) AS starttime, 
  pg_catalog.listagg(
        CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
        , ''::text
    ) WITHIN GROUP(ORDER BY "sequence") 
    AS query_statement 
FROM svl_statementtext
GROUP BY pid,xid
HAVING  min(starttime) >= '2022-06-27 10:00:00';

After the fail, I checked to see if I could find where the excessive size was coming from :

SELECT pid,xid, min(starttime) AS starttime, 
  SUM(OCTET_LENGTH(
      CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
  )) as total_bytes
FROM svl_statementtext
GROUP BY pid,xid
HAVING  min(starttime) >= '2022-06-27 10:00:00'
ORDER BY total_bytes desc;

However the largest size that this query reports is 2962 So how/why is listagg complaining about 65535 ??

Have seen some other posts mentioning using listaggdistinct, and catering for when the value being aggregated is null, but none seem to change my problem.

Any guidance appreciated :)


Solution

  • The longest string that Redshift can hold is 64K bytes. Listagg() is likely generating a string longer than this. The "text" column in svl_statementtext is 200 characters so if you have more than 319 segments you can overflow this string size.

    The other issue I see is that your query will combine multiple statements into one string. You are only grouping by xid and pid which will give you all statements for a transaction. Add starttime to your group by list and this will break different statements into different results.

    Also remember that xid and pid values repeat every few days so have some date range limit can help prevent a lot of confusion.

    You need to add

    where sequence < 320
    

    to your query and also group by starttime.

    Here's a query I have used to put together statements in Redshift:

    select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
    from svl_statementtext
    where pid = (SELECT pg_backend_pid()) --current session
        and sequence < 320
        and starttime > getdate() - interval '24 hours'
    group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;