Search code examples
conditional-statementsimpalacreate-table

Create table statement and add condition to replace None values with NULL in IMPALA


Bellow is my create table statement. Some values are coming as "None" from the data source and want to add a condition to my CREATE TABLE statement to check if a value coming in equal "None" replace it with NULL. Is that possible with Impala without having an intermediate table and then ETL the data to the final TABLE with CASE STATMENT

CREATE TABLE IF NOT EXISTS customer_db.customers_table(
`customer_id` BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
`ts` BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
`customer_name` STRING NULL DEFAULT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
PRIMARY KEY (customer_id, ts)
)
PARTITION BY RANGE (`ts`)(PARTITION VALUE = 0)
STORED AS KUDU

Solution

  • Update: Import your data as CSV. Use Notepad++ to replace none to null Import it https://docs.cloudera.com/machine-learning/cloud/import-data/topics/ml-loading-csv-data-into-an-impala-table.html

    Try using create table as select query

    None is data it cannot be converted to null via create table command So just import data as is in temp table

    Then use create table as select columns then replace column with None to null in select query

    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] db_name.]table_name [PARTITIONED BY (col_name[, ...])] [SORT BY ([column [, column ...]])] [COMMENT 'table_comment']

    • [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
    • [STORED AS ctas_file_format] [LOCATION 'hdfs_path']
    • [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
      
      [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] AS select_statement