Search code examples
sqlamazon-redshiftamazon-athenaamazon-redshift-spectrum

data appears as null on redshift external table while working right on athena


So I'm trying to run the following simple query on redshift spectrum:

select * from company.vehicles where vehicle_id is not null

and it return 0 rows(all of the rows in the table are null). However when I run the same query on athena it works fine and return results. Tried msck repair but both athena and redshift are using the same metastore so it shouldn't matter. I also don't see any errors.

The format of the files is orc.

The create table query is:

CREATE EXTERNAL TABLE 'vehicles'(
  'vehicle_id' bigint, 
  'parent_id' bigint, 
  'client_id' bigint, 
  'assets_group' int, 
  'drivers_group' int)
PARTITIONED BY ( 
  'dt' string, 
  'datacenter' string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://company-rt-data/metadata/out/vehicles/'
TBLPROPERTIES (
  'CrawlerSchemaDeserializerVersion'='1.0', 
  'CrawlerSchemaSerializerVersion'='1.0',  
  'classification'='orc', 
  'compressionType'='none')

Any idea?


Solution

  • Eventually it turned out to be a bug in redshift. In order to fix it, we needed to run the following command:

    ALTER TABLE table_name SET TABLE properties(‘orc.schema.resolution’=‘position’);