Search code examples
hadoopapache-sparkhiveparquet

How to store special characters in Hive?


I've been playing around with Spark, Hive and Parquet, I have some data in my Hive table and here is how it looks like ( warning french language ahead ) :

Bloqu� � l'arriv�e      NULL
Probl�me de connexion   Bloqu� en hub

Obviously there's something wrong here.

What I do is : I read a teradata table as a dataframe with spark, I store it as a parquet file and then I use this file to store it to hive, here's my create table script :

CREATE TABLE `table`(
   `lib` VARCHAR(255),
   `libelle_sous_cause` VARCHAR(255),
   )
 STORED AS PARQUET
 LOCATION
   'hdfs://location';

I don't really know what cause this, it might be caused by some special encoding between Teradata > parquet or Parquet > Hive, I'm not sure.

Any help will be appreciated, thanks.


Solution

  • I figured that out, the solution was to simply use STRING instead of VARCHAR

    CREATE TABLE `table`(
       `lib` STRING,
       `libelle_sous_cause` STRING,
       )
     STORED AS PARQUET
     LOCATION
       'hdfs://location';