apache-spark parquet presto amazon-athena

HIVE_CANNOT_OPEN_SPLIT : Column <column_name> type null not supported

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/<>.snappy.parquet : Column ai.ja type null not supported

This only happens when I define an "JA" column, which is a struct of string. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark.

 The create table statement I'm using to reproduce the error follows:
  CREATE EXTERNAL TABLE <<tablename>>(`ai` struct < acs : varchar(100), ltc : varchar(100), primaryapplicant : struct < bwh : varchar(10), citizenship : varchar(20), currentaddresscity : varchar(50), currentaddressstate : varchar(50), currentaddressstreet2 : varchar(50), ss : varchar(50)>, JA : array < struct < dateofbirth : varchar(50), emailaddress : varchar(50), firstname : varchar(50), lastname : varchar(50), ss : varchar(50)>>, status : varchar(50), uri : varchar(50)>, `pr` struct < pc : struct < cn : varchar(50)>>, `product` array < struct < at : varchar(20), pi : varchar(50), pmn : varchar(256)>>, `ipt` varchar(40) ) PARTITIONED BY ( `owner` varchar(40) ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://<location>' TBLPROPERTIES ( 'compression_type' = 'snappy', 'numRows' = '2', 'transient_lastDdlTime' = <> )

Which is reading from a parquet file.

 Parquet schema : 
  root
 |-- ai: struct (nullable = true)
 |    |-- acs: string (nullable = true)
 |    |-- JA: struct (nullable = true)
 |    |    |-- DateOfBirth: string (nullable = true)
 |    |    |-- EmailAddress: string (nullable = true)
 |    |    |-- FirstName: string (nullable = true)
 |    |    |-- LastName: string (nullable = true)
 |    |    |-- ss: string (nullable = true)
 |    |-- ltc: string (nullable = true)
 |    |-- PrimaryApplicant: struct (nullable = true)
 |    |    |-- bwh: string (nullable = true)
 |    |    |-- Citizenship: string (nullable = true)
 |    |    |-- CurrentAddressCity: string (nullable = true)
 |    |    |-- CurrentAddressState: string (nullable = true)
 |    |    |-- CurrentAddressStreet2: string (nullable = true)
 |    |    |-- ss: string (nullable = true)
 |    |-- Status: string (nullable = true)
 |    |-- uri: string (nullable = true)
 |-- pr: struct (nullable = true)
 |    |-- pc: struct (nullable = true)
 |    |    |-- cn: string (nullable = true)
 |-- Product: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- at: string (nullable = true)
 |    |    |-- pi: string (nullable = true)
 |    |    |-- pmn: string (nullable = true)
 |-- ipt: string (nullable = true)

The same issue was put up on this link https://forums.aws.amazon.com/thread.jspa?threadID=246551. But still not able to figure it out.

Can anyone help?

Solution

This issue is resolved.

For creating an Athena table every field should map exactly to the schema i.e., the order of each field should be the same as that of the schema.