Search code examples
hiveclouderaddlcreate-tablehiveddl

Cloudera - Hive/Impala Show Create Table - Error with the syntax


I'm making some automatic processes to create tables on Cloudera Hive.

For that I am using the show create table statement that me give (for example) the following ddl:

CREATE TABLE clsd_core.factual_player (   player_name STRING,   number_goals INT ) PARTITIONED BY (   player_name STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') STORED AS PARQUET LOCATION 'hdfs://nameservice1/factual_player'

What I need is to run the ddl on a different place to create a table with the same name.

However, when I run that code I return the following error:

Error while compiling statement: FAILED: ParseException line 1:123 missing EOF at 'WITH' near ')'

And I remove manually this part "WITH SERDEPROPERTIES ('serialization.format'='1')" it was able to create the table with success.

Is there a better function to retrieves the tables ddls without the SERDE information?


Solution

  • First issue in your DDL is that partitioned column should not be listed in columns spec, only in the partitioned by. Partition is the folder with name partition_column=value and this column is not stored in the table files, only in the partition directory. If you want partition column to be in the data files, it should be named differently.

    Second issue is that SERDEPROPERTIES is a part of SERDE specification, If you do not specify SERDE, it should be no SERDEPROPERTIES. See this manual: StorageFormat andSerDe

    Fixed DDL:

     CREATE TABLE factual_player (number_goals INT) 
     PARTITIONED BY (player_name STRING) 
     STORED AS PARQUET
     LOCATION 'hdfs://nameservice1/factual_player';
    

    STORED AS PARQUET already implies SERDE, INPUTFORMAT and OUPPUTFORMAT.

    If you want to specify SERDE with it's properties, use this syntax:

    CREATE TABLE factual_player(number_goals int)
    PARTITIONED BY (player_name string)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    WITH SERDEPROPERTIES ('serialization.format'='1') --I believe you really do not need this
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    LOCATION 'hdfs://nameservice1/factual_player'