I want to use AWS Spectrum - Querying on Redshift based on a file in S3.
Since you can either choose a folder in S3 or a JSON file, I opted to use a JSON file as the location.
The bug:
When I reference the file test
in a folder - Redshift works perefectly.
When I reference the JSON file
which references the file test
and I try to do a query that runs on the whole file, it fails:
select count(*)
from test
The error: Unexpected end of compressed file. File may be truncated.
The weird thing is - if I only query X amount of rows, it works:
select *
from test
limit 100;
This means that the columns of the table I created (when I created the external table) are correct. Am I missing something in the JSON file? Here is how I wrote it (taken from AWS Docs):
{
"entries": [
{"url":"s3://s3_bucket/SpectrumTest/test.gz", "meta": { "content_length": 4668469 } }
]
}
content_length
is the size of the file in bytes
. Because the file changed but JSON manifest remained with the old content_length, it didn't load the whole file resulting in the error about the end of the compressed file.
The reason doing limit 100
worked is because DBeaver saved the results in cache.
Hope this helps anyone else in the future.