why does AWS Athena needs 'spill-bucket' when it dumps results in target S3 location
WITH
( format = 'Parquet',
parquet_compression = 'SNAPPY',
external_location = '**s3://target_bucket_name/my_data**'
)
AS
WITH my_data_2
AS
(SELECT * FROM existing_tablegenerated_data" limit 10)
SELECT *
FROM my_data_2;
Since it already has the bucket to store the data , why does Athena need the spill-bucket and what does it store there ?
Trino/Presto developer here who was directly involved in Spill development.
In Trino (formerly known as Presto SQL) the term "spill" refers to dumping on disk data that does not fit into memory. It is an opt-in feature allowing you to process larger queries. Of course, if all your queries require spilling, it's more efficient to simply provision a bigger cluster with more memory, but the functionality is useful when larger queries are rare.
Spilling involves saving temporary data, not the final query results. The spilled data is re-read back and deleted before the query completes execution.