I've been playing around with Spark, Hive and Parquet, I have some data in my Hive table and here is how it looks like ( warning french language ahead ) :
Bloqu� � l'arriv�e NULL
Probl�me de connexion Bloqu� en hub
Obviously there's something wrong here.
What I do is : I read a teradata table as a dataframe with spark, I store it as a parquet file and then I use this file to store it to hive, here's my create table script :
CREATE TABLE `table`(
`lib` VARCHAR(255),
`libelle_sous_cause` VARCHAR(255),
)
STORED AS PARQUET
LOCATION
'hdfs://location';
I don't really know what cause this, it might be caused by some special encoding between Teradata > parquet or Parquet > Hive, I'm not sure.
Any help will be appreciated, thanks.
I figured that out, the solution was to simply use STRING
instead of VARCHAR
CREATE TABLE `table`(
`lib` STRING,
`libelle_sous_cause` STRING,
)
STORED AS PARQUET
LOCATION
'hdfs://location';