I have seen posts discussing the usage of windows function. But i have some questions.
How is that possible to run a HiveQL using windows function here? I tried
df.registerTempTable("data")
from pyspark.sql import functions as F
from pyspark.sql import Window
%%hive
SELECT col1, col2, F.rank() OVER (Window.partitionBy("col1").orderBy("col3")
FROM data
and native Hive SQL
SELECT col1, col2, RANK() OVER (PARTITION BY col1 ORDER BY col3) FROM data
but neither of them works.
How can i switch between SparkSQLContext and HiveContext given i am already using SparkSQLContext?
You cannot. Spark data frames and tables are bound to a specific context. If you want to use HiveContext
then use it all the way. You drag all the dependencies anyway.
How is that possible to run a HiveQL using windows function here
sqlContext = ... # HiveContext
sqlContext.sql(query)
The first query you use is simply invalid. The second one should work if you use correct context and configuration.