I have table employee_1
in Spark with attributes id and name(with data), and another table, employee_2
, with the same attributes. I want to load the data by increasing the id values with +1.
WITH EXP AS (SELECT ALIASNAME.ID+1 ID, ALIASNAME.NAME NAME FROM employee_1 ALIASNAME)
INSERT INTO TABLE employee_2 SELECT * FROM EXP;
I have a file (with data) in an HDFS location.
When I am running with a test program from the backend, it’s succeeding. But the data is not loading. employee_2
is empty.
Note:
If you run the above WITH clause in Hive, it will succeed and the data will load. But in Spark it won't in 1.6. Why?
The WITH statement is not the problem, but rather the INSERT INTO statement that's causing trouble.
Here's a working example that uses the .insertInto() style instead of the "INSERT INTO" SQL:
val s = Seq((1,"foo"), (2, "bar"))
s: Seq[(Int, String)] = List((1,foo), (2,bar))
val df = s.toDF("id", "name")
df.registerTempTable("df")
sql("CREATE TABLE edf_final (id int, name string)")
val e = sql("WITH edf AS (SELECT id+1, name FROM df cook) SELECT * FROM edf")
e.insertInto("edf_final")
Another option is to use the df.write.mode("append").saveAsTable("edf_final")
style.
Relevant SO: "INSERT INTO ..." with SparkSQL HiveContext