I'm trying to query a Hive view with Redshift Spectrum but it gives me this error:
SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: Assert
Details:
-----------------------------------------------
error: Assert
code: 1000
context: loc->length() > 5 && loc->substr(0, 5) == "s3://" -
query: 12103470
location: scan_range_manager.cpp:272
process: padbmaster [pid=1769]
-----------------------------------------------;
Is is possible to query Hive views from Redshift Spectrum? I'm using Hive Metastore (not Glue Data Catalog).
I wanted to have a view to restrict access to the original table, with a limited set of columns and partitions. And also because my original table (Parquet data) has some Map fields so I wanted to do something like that to make it easier to query from Redshift as Map fields are a bit complicated to deal with in Redshift:
CREATE view my_view AS
SELECT event_time, event_properties['user-id'] as user_id, event_properties['product-id'] as product_id, year, month, day
FROM my_events
WHERE event_type = 'my-event' -- partition
I can query the table my_events from Spectrum but it's a mess because properties is a Map field, not a Struct so I need to kind of explode it into several rows in Redshift.
Thanks
Looking at the error it seems Spectrum always looks for a S3 path when external tables and views are queried. This is valid for external tables because those will always have a location but views will never have an explicit S3 location.
Error type -> Assert
Error context -> context: loc->length() > 5 && loc->substr(0, 5) == "s3://"
In case of a hive view,
loc->length()
will return 0, and the whole statement will return False
and result in assertion error.
Confirmation for this could be the second clause:
loc->substr(0, 5) == "s3://"
It is expecting the location to be a S3 path and if we count number of chars in "s3://"
it is 5
, which also confirms the first clause :
loc->length() > 5
Looks like Spectrum does not support Hive Views (or in general any object without an explicit S3 path)