Search code examples
apache-sparkhiveparquet

Hive table load in Parquet format


I have the below input file. I need to load this file in hive table in orc and parquet format.

productID,productCode,name,quantity,price,supplierid 1001,PEN,Pen Red,5000,1.23,501 1002,PEN,Pen Blue,8000,1.25,501

I have pasted my code in the bottom. I am able to successfully create and load in orc hive table but not in parquet.

After creating and loading the parquet table, when i query, i see only NULL values for all fields. Am i missing anything?

val productsupplies = sc.textFile("/user/cloudera/product.csv")
val productfirst = productsupplies.first
val product = productsupplies.filter(f => f != productfirst).map(x => { val a = x.split(",")
(a(0).toInt,a(1),a(2),a(3),a(4).toFloat,a(5))
}).toDF("productID","productCode","name","quantity","price","supplierid")




product.write.orc("/user/cloudera/productsupp.orc")
product.write.parquet("/user/cloudera/productsupp.parquet")


 val hc = new org.apache.spark.sql.hive.HiveContext(sc)

hc.sql("create table product_supp_orc ( " + 
"product_id int, " + 
"product_code string, " + 
"product_name string, " + 
"product_quatity string, " + 
"product_price float, " + 
"product_supplier_id string) stored as orc " + 
"location \"/user/cloudera/productsupp.orc \" ")





hc.sql("create table product_supp_parquet ( " + 
"product_id int, " + 
"product_code string, " + 
"product_name string, " + 
"product_quatity string, " + 
"product_price float, " + 
"product_supplier_id string) stored as parquet " + 
"location \"/user/cloudera/productsupp.parquet\" ")




hc.sql("select * from product_supp_parquet")

Solution

  • Try:

    hc.sql("create table product_supp_parquet ( " + 
    "productid int, " + 
    "productcode string, " + 
    "name string, " + 
    "quantity string, " + 
    "price float, " + 
    "supplierid string) stored as parquet " + 
    "location \"/user/cloudera/products.parquet\" ")
    

    Basically, the names must be the same as what you used in the file for uploading.